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Abstract 

The  Markov  chain  approximation  numerical  methods  are  widely  used 
to  compute  optimal  value  functions  and  controls  for  stochastic  as  well 
as  deterministic  systems.  We  extend  them  to  controlled  general  nonlin¬ 
ear  delayed  reflected  diffusion  models.  The  path,  control,  and  reflection 
terms  can  all  be  delayed.  Previous  work  developed  numerical  approxi¬ 
mations  and  convergence  theorems.  But  when  the  control  and  reflection 
terms  are  delayed  those  and  all  other  current  algorithms  normally  lead 
to  impossible  demands  on  memory.  An  alternative  “dual”  approach  was 
proposed  by  Kwong  and  Vintner  for  the  linear  deterministic  system  with 
a  quadratic  cost  function.  We  extend  the  approach  to  the  general  non¬ 
linear  stochastic  system,  develop  the  Markov  chain  approximations  and 
numerical  algorithms,  and  prove  the  convergence  theorems.  The  approach 
reduces  the  memory  requirement  significantly.  For  the  no-delay  case,  the 
method  covers  virtually  all  models  of  current  interest.  The  method  is 
robust  and  the  approximations  have  physical  interpretations  as  control 
problems  closely  related  to  the  original  one.  These  advantages  carry  over 
to  the  delay  problem. 
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1  Introduction 


The  paper  [6]  extended  the  numerical  methods  of  [10],  known  as  the  Markov 
chain  approximation  methods,  to  controlled  delayed  diffusion  models,  to  numer¬ 
ically  obtain  the  optimal  costs  and  controls.  The  state  of  the  model,  as  needed 
for  the  numerical  procedure,  consists  of  the  segment  of  the  path  over  the  delay 
interval  and  of  the  control  path  as  well  (if  the  control  is  also  delayed).  De¬ 
layed  reflection  terms  were  not  dealt  with.  Convergence  theorems  were  proved 
and  “numerically  efficient”  representations  of  the  state  data  were  developed 
that  reduced  the  memory  requirements  to  manageable  size  for  low-dimensional 
problems,  if  the  path  only  were  delayed.  If  the  control  and/or  reflection  terms 
are  also  delayed,  then  the  memory  requirements  with  any  current  method  is 
prohibitive. 

In  this  paper  we  will  take  an  alternative  approach  that  greatly  reduces  the 
memory  requirements  for  general  nonlinear  stochastic  problems  where  the  con¬ 
trol  and  reflection  terms,  as  well  as  the  path  variables,  are  delayed.  The  ap¬ 
proach  was  suggested  by  the  work  in  [13]  which  dealt  only  with  the  linear  deter¬ 
ministic  system  with  a  quadratic  cost  function,  and  the  development  depended 
heavily  on  the  linear  structure.  But  the  idea  can  be  extended  to  the  problem 
of  concern  here  and  has  numerous  advantages.^  With  this  method,  the  de¬ 
lay  equation  is  replaced  by  a  type  of  stochastic  wave  equation  with  no  delays, 
and  its  numerical  solution  yields  the  optimal  costs  and  controls  for  the  original 
model.  With  appropriate  numerical  algortihms  the  memory  requirements  are 
much  reduced  over  more  direct  methods. 

Delayed  reflection  terms  occur  frequently  in  applications  to  communications 
systems,  where  one  of  the  reflection  terms  corresponds  to  buffer  overflow.  This 
data  is  then  communicated  to  the  controller  via  a  transportation  delay.  In  fact 
such  problems  have  been  a  major  motivation  for  this  work  and  an  example  is 
given  in  the  next  section.  There  is  an  large  literature  on  the  delayed  linear 
system,  quadratic  cost,  and  white  Gaussian  noise  case  [2,  7,  9,  11,  13].  As  for 
the  no-delay  case,  the  analysis  is  essentially  the  same  with  and  without  driving 
noise.  The  problem  reduces  to  the  study  of  an  abstract  Ricatti  equation.  But 
little  has  been  available  for  the  general  nonlinear  stochastic  problem. 

The  basic  idea  of  the  numerical  method  is  to  first  approximate  the  con¬ 
trol  problem  by  a  control  problem  with  a  suitable  approximating  Markov  chain 
model,  then  solve  the  Bellman  equation  for  the  approximation,  and  then  prove 
the  convergence  of  the  optimal  costs  to  that  for  the  original  problem  as  the 
approximation  parameter  goes  to  zero.  The  method  is  robust  and  the  approxi¬ 
mations  have  physical  interpretations  as  control  problems  closely  related  to  the 
original  one. 

Models  for  many  physical  problems  have  reflecting  boundaries.  They  occur 
naturally  in  models  arising  in  queueing/communications  systems  [8],  where  the 
state  space  is  often  bounded  owing  to  the  finiteness  of  buffers  and  nonnegativity 
of  their  content,  and  the  internal  routing  determines  the  reflection  directions  on 

^The  author  would  like  to  thank  Kasi  ltd  for  bringing  the  paper  to  his  attention. 
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the  boundary.  See  the  example  in  Section  2.  Numerical  analysis  for  dynamical 
models  is  usually  done  in  a  bounded  region.  A  common  way  of  bounding  a 
state  space  is  to  impose  a  reflecting  boundary,  if  one  does  not  exist  already.  One 
selects  the  region  so  that  the  boundary  plays  a  minor  role.  Alternatively,  one  can 
stop  the  process  on  exit  from  a  given  bounded  set.  We  work  with  the  reflecting 
boundary  case,  since  it  is  of  considerable  importance.  The  modifications  that 
are  required  when  the  boundary  is  absorbing  are  minor. 

The  model  and  assumptions  are  in  Section  2.  The  assumptions  on  the  re¬ 
flection  directions  are  those  in  [10]  and  are  standard  for  the  reflected  diffusion 
model  (also  known  as  the  Skorokhod  problem).  Section  3  is  concerned  with  a 
representation  of  the  solution  in  terms  of  a  type  of  stochastic  wave  equation 
without  delay  terms.  This  is  an  extension  to  the  general  nonlinear  stochastic 
system  of  the  idea  in  [13]  for  the  deterministic  linear  problem.  The  reference 
contains  a  history  of  the  idea,  still  for  the  linear  deterministic  system.  With 
this  representation,  the  delays  are  eliminated,  but  one  must  solve  a  PDE.  It  is 
shown  that  the  representation  is  equivalent  to  the  original  problem  in  that  any 
solution  to  one  yields  a  solution  to  the  other.  Owing  to  the  special  form  of  the 
PDE,  there  are  straightforward  ways  of  getting  numerical  approximations. 

To  prepare  ourselves  for  what  will  be  required  for  the  numerical  approxima¬ 
tions,  a  discrete-time  approximation  is  developed  in  Section  4.  This  will  suggest 
the  correct  scaling  and  illustrate  the  type  of  algebraic  manipulations  that  are 
required.  It  might  also  be  a  convenient  way  of  simulating  the  original  system. 
The  Markov  chain  approximation  method  is  reviewed  in  Section  5,  starting  with 
the  simpler  no-delay  case.  The  types  of  continuous-time  interpolations  that  will 
be  of  interest  are  outlined  and  the  asymptotic  equivalence  of  their  scalings  is 
proved.  We  try  to  set  the  problem  up  so  that  the  methods  and  results  of  [10] 
can  be  used  without  excessive  duplication  of  details  and  the  proofs  in  [10]  can 
be  appealed  to  and  used  where  possible.  The  form  of  the  transition  probabilities 
for  the  approximating  Markov  chain  are  given.  The  fundamental  assumption 
required  for  the  convergence  of  the  numerical  procedure  is  the  so-called  “local 
consistency  condition”  [10].  This  says  little  more  than  that  the  conditional  mean 
change  (resp,  variance)  in  the  state  of  the  approximating  chain  is  proportional 
to  the  drift  (resp,  covariance)  of  the  diffusion,  modulo  small  errors.  This  would 
seem  to  be  a  minimal  condition.  In  general,  it  need  not  hold  everywhere  (see, 
e.g.,  [10,  Section  5.5]). 

The  proofs  of  convergence  in  [10]  are  purely  probabilistic,  being  based  on 
weak  convergence  methods.  The  idea  is  to  interpolate  the  chain  to  a  continuous¬ 
time  process  in  a  suitable  manner,  and  then  show  that  the  interpolated  processes 
converge  to  an  optimal  diffusion  as  the  approximating  parameter  goes  to  zero. 
The  Skorohod  topology  is  used  on  the  path  spaces.  The  criterion  for  tightness 
for  vector-valued  processs  is  [10,  Theorem  2.1,  Chapter  9]. 

Section  6  contains  the  main  development  of  the  approximation  for  the  delay 
case.  Motivated  by  the  ideas  in  Sections  4  and  5,  it  is  shown  how  to  construct 
the  approximating  chains  for  the  representation  introduced  in  Section  3.  The 
method  is  very  close  to  that  used  for  the  no-delay  case.  Representations  of  the 
resulting  process  are  developed  that  facilitate  proving  that  the  optimal  costs 
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for  the  approximating  chain  converge  to  that  for  the  original  diffusion.  The 
development  is  designed  so  that  the  convergence  proofs  in  [10]  can  be  appealed 
to.  In  Section  7,  the  size  of  the  state  space  that  is  needed  for  the  solution  of  the 
Bellman  equation  is  discussed,  and  it  is  seen  that  the  approach  does  moderate 
the  requirements  considerably.  Although  the  form  of  the  algorithm  is  motivated 
by  those  used  for  the  no-delay  problem,  it  is  more  complicated.  But  then  the 
delay  problem  is  substantially  more  complicated  and  the  proposed  algorithm 
seems  considerably  better  in  terms  of  memory  requirement  than  any  current 
alternative  if  the  control  and/or  reflection  terms  are  delayed. 

The  algorithms  can  be  used  to  obtain  optimal  value  functions  or  controls. 
Controls  for  delay  systems  will  usually  be  too  complicated  for  direct  practical 
use.  But  the  values  provide  benchmarks.  One  would  normally  try  to  approx¬ 
imate  the  optimal  control  by  realizable  forms.  For  example,  one  might  try  to 
approximate  the  control  for  the  example  in  Section  2  by  a  rate  dependent  thresh¬ 
old  on  the  buffer  size.  The  algorithms  can  be  used  for  numerical  exploration. 
For  example,  to  explore  the  effect  of  changing  delay,  under  optimality  condi¬ 
tions.  Generally,  a  cost  function  is  a  compromise  between  competing  criteria, 
and  the  values  of  the  individual  components  of  the  cost  as  well  as  the  total 
value  are  important.  By  varying  the  weights  or  the  form  of  the  components,  the 
algorithms  give  the  tradeoffs  between  components  under  the  best  conditions, 
namely  under  the  optimal  controls.  All  of  this  is  very  useful  information  for  the 
designer. 


2  The  Model  and  Assumptions 


A  motivating  example.  Due  to  the  finite  speed  of  electromagnetic  signaling, 
delays  are  a  common  and  crucial  part  of  many  telecommunications  systems. 
One  important  example  is  the  AIMD  (additive  increase  multiplicative  decrease) 
model  that  arises  in  (FTP)  control  of  internet  traffic  over  long  distances.  The 
following  model,  from  [1],  is  a  good  example. 

dxi{t)  =  Cidt  —  Kody22{t  ~  't)  +  KlUi{x2{t  —  T),Xi{t  —  T))dt  +  dzi{t), 

X2{t)  —  X2{Q)  =  [  [xi{s)  —  b]ds  +  w{t)  +  Z2{t).  (2.1) 

Jo 

Here  Ci  and  b  are  constants,  w(-)  is  a  real- valued  Wiener  process,  X2{-)  represents 
a  scaled  buffer  content  and  xi(-)  is  a  scaled  and  centered  rate  of  transmission. 
The  buffer  and  source  are  separated  by  a  channel  with  a  transmission  delay. 
The  buffer  is  finite  and  0  <  X2{t)  <  B.  The  reflection  term  Z2{-)  serves  to 
keep  X2{-)  in  the  desired  range.  We  write  Z2{-)  =  J/2i(')  —  2/22(0  where  the  non¬ 
decreasing  reflection  component  2/22(0  represents  buffer  overflows  (lost  packets) 
and  can  increase  only  when  X2(t)  =  B.  The  nondecreasing  component  2/2i(0 
keeps  the  buffer  content  nonnegative.  In  this  model,  overflow  packets  are  not 
acknowledged.  Hence  after  the  communication  delay  r,  the  lack  of  acknowledg¬ 
ment  causes  an  automatic  decrease  in  the  transmission  rate  a;i(0-  The  mi(0  is 
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a  control  that  allows  the  user  to  adjust  its  transmission  rate  as  a  function  of  the 
delayed  data.  The  scaled  rate  variable  a;i(-)  might  not  be  physically  bounded. 
But  for  the  purposes  of  numerical  approximations,  it  is  usually  confined  to  an 
appropriate  set  and  kept  in  that  set  by  means  of  boundary  reflection  Zi(-).  Note 
that  the  delayed  reflection  term  2/22(')  is  an  essential  component  of  the  model. 

The  model.  The  general  model  has  the  form  of  a  controlled  reflected  diffusion 
process  with  possible  delays  in  the  state,  control,  and  reflection  term.  Let  fR’’ 
denote  Euclidean  r-space.  The  r-dimensional  state  process  x(-)  will  be  con¬ 
fined  to  a  convex  polyhedron  G  G  WG ,  with  a  nonempty  interior,  by  means  of 
the  boundary  reflection  process  z{-),  as  in  [6].  The  conditions  on  the  reflection 
directions  on  the  boundary  are  spelled  out  below.  Let  Xi  denote  the  ith  com¬ 
ponent  of  a  vector  x.  Note  that  what  is  usually  penalized  are  buffer  overflows, 
and  on  the  boundaries  associated  with  overflows,  the  reflection  directions  are 
usually  inward  normals  to  the  boundary. 

Assumptions  on  the  state  space  G.  Assumptions  (A2.1)  and  (A2.2)  are 
the  ones  used  in  [10]  (see  Section  5.7  of  that  reference),  and  are  standard  in  the 
treatment  of  general  reflecting  diffusions  [3,  4],  [8,  Section  3.5]. 

A2.1.  The  state  space  G  is  the  intersection  of  a  finite  number  of  closed  half 
spaces  in  Euclidean  r-space  FG  and  is  the  closure  of  its  interior  {i.e.,  it  is  a 
closed  convex  polyhedron  with  an  interior  and  planar  sides).  Let  dGi,  i  =  1, . . . , 
denote  the  faces  of  G,  and  Ui  the  interior  normal  to  dGi.  Interior  to  dGi,  the 
reflection  direction  is  denoted  by  the  unit  vector  di,  and  {di,ni)  >  0  for  each  i. 
The  possible  reflection  directions  at  points  on  the  intersections  of  the  dGi 
in  the  convex  hull  of  the  directions  on  the  adjoining  faces.  Let  d{x)  denote  the 
set  of  reflection  directions  at  the  point  x  G  dG,  whether  it  is  a  singleton  or  not. 
No  more  than  r  constraints  are  active  at  any  boundary  point. 

A2.2.  For  each  x  G  dG,  define  the  index  set  I{x)  =  {i  :  x  G  dGi\.  Suppose 
that  X  G  dG  lies  in  the  intersection  of  more  than  one  boundary;  that  is,  I{x)  has 
the  form  I{x)  =  {ii,. ..  ,ik}  for  some  k  >  1.  Let  N{x)  denote  the  convex  hull 
of  the  interior  normals  Ui^, . . . ,  ni,.  to  dGi^ ,  •  ■  • ,  dGi,. ,  resp.,  at  x.  Then  there  is 
some  vector  v  G  N{x)  such  that  y'u  >  0  for  all  7  G  d{x). 

There  is  a  neighborhood  N{dG)  and  an  extension  of  d{-)  to  N{dG)  that  is 
upper  semicontinuous  in  the  following  sense:  For  each  e  >  0,  there  is  p  >  Q  that 
goes  to  zero  as  e  ^  Q  and  such  that  if  x  G  N{dG)  —  dG  and  distance{x,  dG)  <  p, 
then  d{x)  is  in  the  convex  hull  of  the  directions  {d{v);v  G  dG,  distance(x,v)  < 

e}. 
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Our  general  model  takes  the  form,  where  we  define  d^{0)  =  /i(0)  —  ii{6  —  d6) 


dx{t)  =  c{x{t),u{t))dt  +  dt  J  b{x{t  +  9),u{t  +  9),0)d6 

+dt  J  g{x{t  +  9),u{t  +  9),9)dfi{9)  +  a{x{t))dw{t)  +  dz{t)  (2.2) 


-\-dt 


p{9)dgy{t  +  9). 


IS=-7 


Here  w{-)  is  a  standard  and  possibly  vector- valued  Wiener  process,  z{-)  is  the 
reflection  term  and  r  >  0  is  the  non-random  maximum  delay.  We  can  represent 
z{-)  as  z{t)  =  '^idiyi{t),  where  yi{-)  denotes  the  nondecreasing  process  that 
can  increase  only  when  x{t)  is  on  the  zth  face  of  G.  The  last  integral  in  (2.2)  is 
with  respect  to  d9  in  the  sense  that 


p{9)dgy{t  +  9)  =  p{9)  [y{t  +  9  +  d9)  -  y{t  +  9)]  . 

The  initial  condition  is  the  pair  x  =  {a:(s),  — r  <  s  <  0},  u  =  {u(s),  — r  <  s  < 
0}.  One  could  incorporate  the  term  containing  &(•)  into  the  term  containing  (/(•). 
But  the  idea  is  that  the  first  term  represents  distributed  delays,  while  the  latter 
can  represent  “point”  delays,  ”  so  we  prefer  to  use  both  terms.  See  [8,  10]  for 
more  detail  on  controlled  reflected  diffusions. 

Note  that  in  the  example  (2.1)  the  delayed  reflection  term  is  y22{‘),  a  com¬ 
ponent  of  Z2{-),  and  it  appears  only  in  the  equation  for  a;i(-).  Since  a;i(-)  is 
bounded,  the  processes  X2{-),Z2{-)  are  well  defined  and  hence  the  term  y22{') 
in  the  equation  for  a;i(-)  is  well  defined;  it  is  continuous  and  nondecreasing.  In 
order  to  assure  that  the  reflection  and  solution  for  (2.2)  are  well  defined  we 
need  to  assure  that  the  delayed  reflections  and  non-delayed  reflections  are  “sep¬ 
arated.”  This  consideration  and  the  structure  of  the  motivating  example  lead 
to  the  following  assumption  for  the  p{-)  in  (2.2). 


A2.3.  There  is  tq  €  (0,t]  such  that  p{9)  =  0  for  9  >  — tq. 

A2.4.  The  control  takes  values  in  a  compact  set  U  and  is  measurable  (as  a 
function  of  {oj,t))  and  is  nonanticipative  with  respect  to  w{-).  The  functions 
6(-),  c(-),p(-),  (/(•)  are  hounded  and  continuous.  All  functions  of  9  have  value 
zero  for  9  <  — r  and  9  >  0.  The  function  ct(-)  is  bounded  and  continuous  and 
/x(-)  is  a  finite  measure  on  [— r,  0]  with  /r([— t,  0])  — >  0  as  t  ^  0.  We  suppose 
that  zff)  =  0,  t  <  0. 


A2.5.  There  is  a  unique  weak  sense  solution  to  (2.2)  for  each  initial  condition 
and  admissible  control. 

The  reflected  diffusion  model  (2.2)  is  known  as  the  Skorokhod  problem.  For 
a  detailed  discussion  of  the  Skorohod  problem  and  the  assumptions  (A2.1)  and 
(A2.2),  see  [8,  Chapter  3],  [3,  4].  Let  \z\{t)  denote  the  variation  of  z{-)  on  the 
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interval  [0,  t].  By  a  solution  to  (2.2)  we  mean  the  following.  The  z{-)  is  the  reflec¬ 
tion  process  and  satisfies  the  following  conditions:  |z|(t)  <  oo  with  probability 
one  (w.p.l)  for  all  t,  and  there  is  a  measurable  function  7(-)  with  j{t)  S  d{x{t)) 
w.p.l  such  that  z{t)  =  7(s)d|2;|(s).  This  says  only  that  the  reflection  process 

can  change  only  when  x(t)  is  on  the  boundary,  and  the  increments  are  in  a 
correct  reflection  direction. 

Comments  on  the  assumptions.  One  can  always  construct  the  extension  in 
(A2.2).  Under  (A2.1)-(A2.2),  the  choice  of  the  reflection  direction  on  the  corners 
and  edges  of  G  has  no  effect  on  the  process.  To  see  that  (A2.1)  is  natural  in 
applications  note  the  following.  If  the  state  space  is  being  bounded  for  purely 
numerical  reasons,  then  the  reflections  are  introduced  only  to  give  a  compact 
set  G,  which  should  be  large  enough  so  that  the  effects  on  the  solution  in  the 
region  of  main  interest  are  small.  A  common  choice  is  a  hyperrectangle  with 
interior  normal  reflection  directions.  The  condition  (A2.2)  implies  (see  [3,  8]), 
the  so-called  “completely-S”’  condition,  the  fundamental  boundary  condition  for 
the  modelling  of  stochastic  networks,  [5,  8,  12],  and  which  is  used  to  ensure  that 
z(-)  has  bounded  variation  w.p.l.  Various  extensions  are  possible.  A  jump  term 
can  be  added  with  no  additional  problems,  provided  that  it  does  not  involve  a 
delay.  The  a:(-)  could  also  be  delayed  in  the  function  a{-).  A  delayed  Wiener 
process  term  such  as 

dt  f  a^{x{t  +  9),6)dgw{t  +  9) 

Jo 

can  be  added. 

Comment  on  the  delayed  reflection  term  in  (2.2).  Consider  a  one¬ 
dimensional  problem.  Let  p{9)  =  l/J  on  the  interval  [—A  — 5,  —  A],  A  >  0, 5  >  0, 
with  value  zero  elsewhere.  Then,  with  y(t)  =  0,t  <  0,  we  have 

[  ds  [  p{9)dey{s  +  9)  =  ^  [  y{s)ds  y{t  -  A) 

Jo  J-T  0 

for  small  6.  In  this  way,  point  delays  as  well  as  distributed  delays  can  be 
approximated. 

Relaxed  controls.  For  purposes  of  proving  approximation  and  limit  theo¬ 
rems,  it  is  usual  and  very  convenient  to  work  in  terms  of  relaxed  controls. 
Recall  the  definition  of  a  relaxed  control  m(-)  [10].  It  is  a  measure  on  the 
Borel  sets  of  [/  x  [0,oo),  with  m{A  x  [0,-])  being  measurable  and  nonantic- 
ipative  with  respect  to  w{-)  for  each  Borel  A  G  U,  and  satisfying  m{U  x 
[0,t])  =  t.  Write  m{A,t)  =  m{A  x  [0,t]).  The  left-hand  derivative^  m'{da,t)  = 
limo<5^o[TO(c^Q:5  f)  —  m{da,  t  —  5)]/(5  is  deflned  for  almost  all  (w,  t).  By  the  def¬ 
initions,  m{dads)  =  m! {da,  s)ds.  For  0  <  v  <  r,  we  write  m{da,ds  —  v)  for 

^In  [10]  mt  was  used  to  denote  the  derivative.  But  this  notation  would  be  confusing  in  the 
context  of  the  notation  required  to  represent  the  various  delays  in  this  paper. 
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m{da,s  —  v)  —  m{da,s  —  ds  —  v).  The  weak  topology  is  used  on  the  relaxed 
controls.  Thus  m"(-)  converges  to  to(-)  if  and  only  if//  </>(«,  s)m"  (da  ds)  — > 
/  /  s)m{dads)  for  all  continuous  functions  (/)(•)  with  compact  support. 
With  this  topology,  the  space  of  relaxed  controls  is  compact.  An  ordinary 
control  u{-)  can  be  written  as  the  relaxed  control  to(-)  defined  by  its  derivative 
m! {A,t)  =  I^u{t)GA}i  where  Ik  is  the  indicator  function  of  the  set  K.  Then 
m(A,  t)  is  the  amount  of  time  that  the  control  takes  values  in  the  set  A  by  time 
t.  The  controls  for  the  numerical  procedure  will  be  ordinary  controls,  but  those 
for  the  limit  might  be  relaxed.  For  the  initial  condition  on  [— r,  0]  we  use  an 
ordinary  control.  The  relaxed  control  form  of  (2.2)  is 


dx{t)  =  dt  f  c{x{t),a)m'{da,t)  +  dt  f  f  b{x{t  +  9),a,9)m'{da,t  +  d)d0 

Ju  JuJ-T 


-\-dt 


g{x{t  +  9),a,  9)m'{da,  t  +  9)dii{9)  +  a{x{t))dw{t) 


l-T  JU 


+dz{t)  +  dt 


I0=-T 


p{9)dey{t  +  9) . 


(2.3) 

The  following  theorem  is  [10,  Theorem  1.1,  Chapter  11].  The  last  assertion 
is  proved  by  working  recursively  on  intervals  (see  (A2.3)  for  the  definition  of  tq) 
(0,  To],  (to,  2ro], , . . . ,  until  the  interval  (0,  T]  is  covered. 


Lemma  2.1.  Assume  (A2.1)-(A2.2).  Let  /(•)  and  a{-)  he  measurable  and  non- 
anticipative  processes  of  the  appropriate  dimension,  and  hounded  in  absolute 
value  by  some  constant  K  <  oo.  Define 


dXft)  =  f{t)dt  +  a{t)dw{t)  +  dZ{t),  A(0)  €  G, 


where  Z(-)  is  the  reflection  term.  Let  \Z\{t)  denote  the  variation  of  Z{-)  on  the 
interval  [Q,f\.  Then 

lim  sup  A|Z|2(T)  =  0.  (2.4) 

For  each  T  <  oo, 

sup  E\Z\^{T)  <  oo.  (2.5) 

X{0)J,a 

Let  Yi{-)  denote  the  component  of  the  reflection  process  that  is  due  to  reflection 
on  the  ith  face,  with  corners  and  edges  assigned  in  any  way  at  all  to  the  adjacent 
faces.  Assume  the  condition  (A2.3)  onp(-),  and  redefine  A(-)  by 

dX{t)  =  f{t)dt  +  a{t)dw{t)  +  dt  f  p{9)deY{t  + 9)  +  dZ{t),  A(0)  S  G. 

Je=-T 

Then  (2.4)  and  (2.5)  continue  to  hold. 

With  p{-)  omitted,  the  bounds  in  (2.4)  and  (2.5)  depend  on  if  sup«y  |A(0)  + 
/o  f{s)ds+ f*  (T(s)dw(s)p,  and  the  appropriate  “recursive” adjustments  are  made 
when  p(-)  yf  0. 


An  extension.  Suppose  that  the  system  evolves  as  dX{t)  =  f{t)dt+a{t)dw{t)+ 
dt  p{6)dgY {t  +  9)  on  the  intervals  [nA,  nA  +  A),  and  the  reflection  comes 

in  at  times  nA  if  X{nA—)  ^  G.  Then  the  lemma  holds  if  lim^.A^o  replaces 
limT’^o  in  (2.4).  The  proof  is  similar  to  that  in  the  reference. 

A  discounted  cost  function.  Let  x  and  u  denote  the  canonical  value  of  the 
path  and  control  segments,  resp.,  on  [— r,  0].  For  /3  >  0,  some  vector  q,  and 
control  process  u{-)  on  [0,oo),  the  cost  function  is 

pOC) 

W{x,u,u)  =  El^  e~>^*[k{x{t),u{t))dt  +  q'dy],  (2.6) 

Jo 

with  the  analogous  form  for  a  relaxed  control.  The  function  k{-)  is  bounded, 
continuous,  and  real-valued.  If  two  adjacent  faces  of  G  have  the  same  reflection 
direction,  then  the  associated  components  of  the  vector  q  must  be  the  same.  By 
Lemma  2.1,  the  reflection  term  component  of  the  cost  is  well  defined.  The  Llif* 
denotes  the  expectation  given  the  initial  condition  and  that  control  u{-)  is  used. 
Let  V{x,u)  denote  the  inflmum  of  the  costs,  over  all  controls. 

Existence  of  an  optimal  control.  The  existence  of  an  optimal  relaxed  control 
was  shown  in  [6,  Theorem  2.1]  for  a  one-dimensional  model  without  a  delayed 
reflection  term.  The  proof  in  the  case  of  concern  here  is  similar.  One  takes 
minimizing  sequences  of  controls  and  then  a  weakly  convergent  subsequence  of 
the  (path,  relaxed  control,  Wiener  process,  reflection  term)  and  shows  that  the 
limit  satisfies  (2.3)  and  that  the  minimizing  sequence  of  costs  converges  to  the 
cost  for  the  limit  processes.  Furthermore  ([6,  Theorem  2.3])  the  inflmum  of  the 
cost  over  the  relaxed  controls  is  equal  to  the  inflmum  of  the  costs  over  ordinary 
controls. 


3  A  Useful  Representation  of  x{') 

For  — T  <  9  <0,  define  processes  x°(')  and  x^(')  by 

dx°{t)  =  X^{t,0)dt  +  c{x'^{t),u{t))dt  +  a{x°{t))dw{t)  +  dz°{t),  (3.1) 

dtX^it,  9)  =  -dgx^it,  d)  +  b{x°{t),u{t),9)dt 

+9{x°{t),u{t),9)  [n{9)  -  ii{9  -  dt)]  +p{9)dy°{t). 

The  reflection  term  z°(-)  will  be  for  the  process  X°(’)>  which  takes  values  in  the 
constraint  set  G  and  is  subject  to  the  boundary  conditions  (A2.1)  and  (A2.2). 
The  interpretation  of  the  stochastic  partial  differential  equation  (3.2),  as  well 
as  of  the  relaxed  control  form  (3.2r)  below,  is  given  by  (3.5),  (3.6)  below,  which 
defines  the  solution.  Theorem  3.1  shows  that  x°(’)  =  If  there  is  no  delayed 
reflection  term,  then  the  values  of  x^it^d)  will  be  seen  to  be  bounded.  If  the 
solution  is  not  bounded,  then  “numerical”  bounds  will  have  to  be  added,  and 
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we  return  to  this  point  in  Section  6.  The  relaxed  control  forms  of  (3.1)  and  (3.2) 
are 

dx°{i)  =X^{t,^)dt+  [  c{x°{t),a)m\da,t)dt  +  a{x°{t))dw{t)  +  dz°{t),  (3.1r) 
Ju 


dtx^{t,e)  = -dex^{t,e)  +  dt  [  b{x°{t),a,9)m' {da,t) 

Ju 


9{x°{t),o:,  6)m'{da,  t)  [^(6»)  -  ^(6<  -  dt)]  +  p{e)dy°{t). 


(3.2r) 


These  processes  will  be  the  basis  of  the  development  of  the  numerical  method. 
The  linear  and  deterministic  form  of  (3.1),  (3.2)  were  used  in  [13]  to  represent 
the  linear  and  deterministic  analog  of  (2.2),  but  without  the  analog  of  the  terms 
involving  /r(-)  or  the  reflection. 

The  initial  conditions  for  (3.1)  and  (3.2)  are  x°(0)  =  a^(0)i  -2°(s)  =  z^{s,0)  = 
0  for  s  <  0,  and 


l-e 

X^{0,d)=J  b{x{-i  -  e),u{x  -  0,x)dl 

~Ie 


/t?  pO 

9{x{x  -  -  d),x)dy{l)  +  J  p{x)d^y°{x  -  9). 


(3.3) 


The  boundary  condition  is  x^(t,  — r)  =  0. 


Note  on  dimension  and  size  of  the  system  state.  The  dimension  of 
X^(-)  is  equal  to  the  number  of  components  of  x(-)  whose  dynamical  terms  have 
delays.  Thus  the  method  would  currently  be  impractical  if  the  dynamics  of  more 
than  one  component  of  x  contained  delays.  For  the  components  Xi{-)  whose 
dynamical  terms  do  not  have  delays,  simply  define  Xi(')  =  x\{')  =  0- 

The  dimension  of  x^(')  does  not  depend  on  the  number  of  controls.  Suppose 
that  delayed  values  of  components  Xi{-),i  =  1, . . . ,  ri,  =  1, . . . ,  r2,  and 

yi{-),i  =  1, . . . ,  ra,  are  required.  For  the  original  problem,  the  full  system  state 
consists  of  the  initial  condition  a;(0)  and  the  memory  segments  of  the  Xi{-),i  = 
l,...,ri,  =  l,...,r2,  and  yi{-),i  =  l,...,r3,  where  the  Ui{-)  have  no 

particular  regularity  properties  and  are  difficult  to  approximate  efficiently.  The 
full  system  or  memory  state  for  (3.1),  (3.2),  is  just  a;(0)  and  the  current  values 
X^{t,d),  -r  <  6»  <  0. 


A  semigroup  representation  of  (3.1),  (3.2).  The  part  (itx^(^;  ^)  =  ~d0X^{tjd) 
of  (3.2)  is  a  type  of  wave  equation  and  its  semigroup  will  play  a  major  role.  Fol¬ 
lowing  [13],  define  the  semigroup  <&(•)  (where  —t<9<0)  by 


f{9  —  t),  —T<9  —  t<0, 

0,  otherwise. 


(3.4) 


<!)(•)  will  only  act  on  functions  of  9. 
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The  construction  of  the  numerical  approximations  will  use  the  dynamical 
representation  (3.1),  (3.2)  as  a  heuristic  guide,  but  the  solution  to  (3.1),  (3.2) 
is  always  interpreted  in  the  “variation  of  constants”  form 

c^X°(f)  =  + +dz°{t),  (3.5) 


xHt,e) 


-  s)  [b{x°{s),u{s),9)ds+p{0)dy°{s) 


+  f  $(t  -  s)g(x°(s),u(s),9)  [p(0)  -  fj.{e  -  ds)]  . 
Jo 


(3.6) 


The  integral  involving  /i(-)  is  well  defined,  since  the  integration  is  to  be  done 
after  the  operation  by  <i>(t  —  s)  and  we  can  write 


^(i  -  s)g{x°{s),u{s),e)  [/r(6»)  -  g{e  -  ds)] 

=  /  9{x°{s),u{s),9  -  t  +  s)/{_.r<e-t+s<o}  [m(^  -  t  +  s)  -  g{9  -  t  +  s  -  ds)] 
Jo 

l<e 

=  /  9{x°{x  +  t-^),u{x  +  t-9),x)dg{x). 

J  max{0— i,  — r} 

(3.7) 

For  the  relaxed  control  form  of  (3.6),  use  the  forms 


f  —  s)  f  b{x^{s),a,9)m'{da,  s)ds, 
Jo  Ju 


s)9{x°{s),0!,  9)m'{da,  s)  [g{9)  -  g{9  -  ds)] , 


and  in  (3.5)  use  dt  Jjj  c{x^{t),a)m'{da,t).  The  cost  function  is  (2.6)  with  x°(t) 
replacing  x{-). 

The  following  theorem  is  a  nonlinear  and  stochastic  version  of  the  linear  and 
deterministic  result  in  [13]. 


Theorem  3.1.  Assume  (A2.1)-(A2.5).  Then  (3.1)  and  (3.2)  have  the  weak 
sense  unique  solution 

X°(-)  =  ^(-),  (3.8) 


X^(l,Q)=  /  Kx°(^  +  7-fi'),M(t  +  7-d),7)d7  + 


p(7)d^?/°(t +  7-  6») 


p9 

+  /  9{x°{t  +  l-0),u{t  +  x-9),x)dg{x)- 


(3.9) 


The  analogous  result  holds  for  the  relaxed  control  form,  where  we  use 

J  J  b{x°{t  +  l-^),a,  x)m'{da,  t  +  7  -  9)d'y, 


11 


J  j  9{x^{t  +  l  -  0),a,-i)m'{da,t  +  -i  -  e)d^i,{'^) 

in  place  of  the  first  and  third  terms  on  the  right  side  of  (3.9). 

Comment  on  uniqueness.  We  assumed  that  (2.2)  and  (2.3)  have  unique 
weak-sense  solutions  for  any  admissible  control.  The  proof  shows  that  any 
solution  to  (3.5),  (3.6)  has  the  form  (3.8),  (3.9).  Given  x{-)  satisfying  (2.2) 
or  (2.3),  replace  x°(-)  in  (3.9)  by  x{-).  Setting  9  =  0  and  substituting  (3.9) 
into  (3.5)  with  x°(-)  =  x(-)  yields  that  (3.5)  is  just  (2.2).  Conversely,  given  a 
solution  to  (3.5),  (3.6),  where  x^(')  is  given  by  (3.9),  we  have  that  x°(')  solves 
(2.2).  Hence,  the  solution  to  (3.5),  (3.6)  is  also  weak-sense  unique. 


Proof.  For  simplicity  in  the  notation,  work  with  ordinary  rather  than  relaxed 
controls.  The  development  for  the  relaxed  control  form  is  analogous. 

From  the  comments  above  concerning  uniqueness,  we  need  only  show  that 
any  measurable  and  non-anticipative  solution  (x°(’)5  X^('))  satisfying  (3.5),  (3.6) 
implies  (3.9).  Consider  the  representation  (3.6).  The  component  due  to  the 
initial  condition  in  (3.3)  is 


^(i)x^(O.^)  =  J  -  0  +  t),u{-f  -  9  +  t),-f)I{_r<e-tm^'y 

/9—t 

5(X°(7  -  9  +  t),u{'y  -  0  +  t),'y)l{_r<0-t<o}dp.{x) 

+  J  p{l)dyy°{x-9  +  t) 


^max{0— t,  — r} 


^(X°(7  -9  +  t),u{x-9  +  t),x)d'y 

^max{0— i,— r} 


^max{0— i,— r} 


5(X°(7  -9  +  t),u{j-9  +  t),j)dfj,{x)  (3.10) 

p{l)d^y°h-9  +  t). 


At  0  =  0,  we  have 

^max  {  — T,— f} 


^(x°(7  +  ^),'w(f  +  7),7)c^7- 

^max{  — i,— r} 

.0 


^max{  — t,— r} 


+ 


5(X  (7  +  +  f),x)dy,{x) 

p{l)d^y°{l  +  f). 


(3.11) 


Continuing  with  the  main  term  in  (3.6)  involving  5(-),  we  have,  with  the 
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substitution  ^  =  6  —  t  +  sors  =  ^  —  9  +  t, 


[  <^{t- s)b{x°{s),u{s),9)ds=  [  b{x°{s),u{s),9 -t  + s)I[_r<e-t+s<o}ds 
Jo  ^  Jo 

Jd-t 
f-e 

Kx°(7  -0  +  t),u{x-9  +  t),x)dx. 

(3.12) 


J  max{0— t,— r} 

Now  consider  the  delayed  reflection  term.  For  the  main  term  in  (3.6), 


[  <^{t-  s)p{9)dy^{s)  =  [  p{9-t  +  s)I[_r<0-t+s<o}dy°{s) 

Jo  Jo 

p9  p9 

=  p{'l)I{-r<'rm^7y°(x-9  +  t)=  p{x)djy°{j-9  +  t). 

J  9—t  Jmaxi9—t.—T\ 


(3.13) 

The  main  term  in  (3.6)  involving  the  measure  p{-)  was  evaluated  in  (3.7). 
Adding  (3.10),  (3.12),  (3.13)  and  (3.7),  yields  (3.9).  Setting  0  =  0  in  (3.9) 
and  substituting  it  into  (3.5)  yields 

/O  pO 

b{x{t  +  x),u{t  +  x),x)dx  +  dt  J  p{x)d^y°{t  +  x) 

7° 

+dt  J  g(x°(7  +  'y),u{t  +  x)n)dp{x)  +  c{x°{t),u{t))dt  +  a{x° {t))dw{t)  +  dz°{t), 
which  is  the  equation  for  x{-).  ■ 


4  A  Discrete  Time  and  State  Approximation: 
Motivation 


A  numerical  procedure.  The  forms  (3.1),  (3.2)  will  motivate  the  numerical 
algorithms.  But  we  will  need  to  show  that  the  processes  associated  with  the 
numerical  algorithms  converge  to  (3.5),  (3.6)  with  x°(.)  =  a;(.),  where  a;(.)  solves 
(2.2)  or  (2.3).  The  actual  numerical  algorithms  for  getting  the  optimal  costs  will 
be  discussed  in  Section  6.  To  prepare  ourselves  for  that  discussion  and  the  types 
of  algebraic  manipulations  that  will  be  needed,  in  this  section  we  will  discuss  a 
simple  discrete  time  approximation  to  (3.5),  (3.6).  This  approximation  is  not 
intended  to  be  used  to  solve  the  optimal  control  problem,  but  it  will  provide 
helpful  insights  and  guides  and  is  useful  for  simulation. 

Let  us  formally  consider  a  simple  discrete-time  discrete-0  approximation  to 
(3.1),  (3.2),  where  6  is  the  interval  for  9  and  A  is  the  time  interval  and  use 
piecewise  constant  interpolations  in  time.  Then,  for  — r  <  9  <  0,  t  =  nA,  and 


13 


letting  denote  the  approximations,  we  can  write 

+  A)  -  0)  +  Ac(x°’'’^(t),  M(t))+ 

a(x°’'’^(^))[^c(^  +  A)  -  ic(t)]  +  [^od.A(^  +  A)  - 

+  A,  0)  -  x'’'’^(t,  e)  =  -  [x'’'’^(t,  9)  -  0  -  <5)]  I  (4.1) 

+ A  [6(x°’^’^ it) Mt),9)+  p{e)  (t  +  A)  -  yO-^A (i)] ] 

with  boundary  condition  X^’'^’^(l)  ~'r)  =  0  ^iid  initial  condition  being  the  dis¬ 
cretization  of  (3.3).  The  above  is  the  reflection  process  for  x°’‘^’"^(’)) 

and  diU^'  ’  (•)  is  the  component  due  to  reflection  on  the  tth  face  of  G.  If  this 
process  ever  leaves  the  set  G,  then  it  is  immediately  reflected  back  in  accordance 
with  the  local  reflection  direction  as  defined  in  (A2.1),  (A2.2).  Note  that  the 
backward  difference  in  9  is  used.  Since  we  are  interested  in  only  the  general  ap¬ 
proach,  let  us  suppose  that  u{-)  is  continuous.  In  Sections  5  and  6,  the  controls 
are  arbitrary. 

We  must  have  A  =  (5  if  there  is  to  be  convergence  to  the  correct  limit  as 
the  discretization  levels  go  to  zero.  So,  let  A  =  <5  henceforth  and  with  u^{-) 
denoting  the  piecewise  constant  discretization  of  u{-),  rewrite  (4.1)  as 

X^’\t  +  5)=  x^’\t)  +  5x^'\t,  0)  +  6c{x°^\t)y{t)) 

+a(x°’^(t))  [w{t  +  6)-  w{t)]  +  +  6)-  , 

x'^'^t  +  S,  9)  =  x^'\t,9-  S)  +  6[b{x°'\t),u\t),9)] 

+9{x°{t),u\t),9)  [K9)  -  K9  -  <5)]  +  p{9)  +  S)  -  y°’‘^(t)]  , 

(4.3) 

with  x^’'^(^j  —t)  =  0. 

For  functions  f{t,9),  deflne  the  operator  4)'^,  analogously  to  (3.4),  by 

{<^>^f{-)){t,9)  =  f{t,9-S),  -t<9-S<0,  (4.4) 

with  (4)'^/(.))(t,  6*)  =  0  otherwise.  Iterating  (4.4)  yields 

[4>^]V(*<5,0)  =  fits,  9-  kS)I^_,<o-ksm-  (4-5) 

Now  (4.3)  can  be  written  as 

X^^\t  +  6,-)=  .)  +  <5  [bix°'\t),u\t),-)] 

+9ix°it),u\t),9)  [pi9)  -  pi9  -  S)]  +p{-)  [y°’^{t  +  S)-y°’\t)]  , 

(4.6) 


14 


with  ~t)  =  0-  Iterating  yields 

n—1 

X^’^(nS,0)  =  + 

2=0 

n—1 

+  [g(x°’H^S),u^(2S),0)]  lg(0)  -  ^(0  -  S)] 

2=0 

n—1 

+  J2i^Y~'~^pW  +  S)-  y0’^(*5)]  . 


(4.7) 


2=0 


and 


X°’\nS)  =  x°’'^(0)  +  <5  ^  0)  +  ^  X!  c(x°’'^(i(5),  M‘^(t(5)) 

2=0  2=0 

n—1 

+  ^  cr(x°’'^(*<^))  [w{i5  +  (5)  —  ri;(i(5)]  +  z^'^{nS). 


i=0 


Theorem  4.1.  Assume  (A2.1)-(A2.5)  and  let  the  admissible  controls  u^{-) 
converge  to  the  continuous  admissible  control  u(-).  Then  x''’'^(')  converges  to 
x(-)  and  the  costs  converge  as  well. 

Proof.  Let  X°’‘^(’)) denote  piecewise  constant,  right  continuous,  interpola¬ 
tions,  with  intervals  S.  Let  e(t,  6)  denote  a  function  that  goes  to  zero,  uniformly 
in  (t,  O')  on  any  bounded  t-interval,  as  (5  — >  0.  Its  value  might  change  from  usage 
to  usage.  Then,  for  t  =  nS,  the  first  sum  in  (4.7)  is 

n—1 

Hx°'\'iS),u\iS),9  -t  +  iS  +  6)I^_r<e-t+^5+s<o} 

2=0  ^ 

=  [  b{x°'\s),u\s),0  -  t  +  s)I[_r<e-t+s<o}ds  +  e{t,6)  (4-8) 

dOg 

=  [  +  t  -  d),u\x  +  t  -  0),x)dx  +  e{t,S). 

J  max{0— t,— r} 

Since  the  x°’‘^(')  ^nd  u^{-)  are  piecewise  constant  on  the  intervals  [k8,k5  +  6) 
and  0  is  a  (negative)  integral  multiple  of  S,  the  e(t,  S)  term  arises  from  the  9 
dependence  of  b{x,  u,  9).  The  treatment  of  the  second  sum  in  (4.7)  is  similar,  as 
follows.  For  t  =  n6, 

n—1 

9{x°'\'id),u\i6),9  -t  +  i6  +  6)I{_r<0-t+iS+s<o}  [K^  -  t  +  i6  +  S)  -  g{9  -  t  +  i(5)] 

i=0  ^ 

=  [  9{x°'\s),u\s),9  -  t  +  s)I[_r<0-t+s<o}  [pid  -t  +  s)-g{9-t  +  s-  ds)]  +  e{t,  6) 
dOg 

9{x°'\l  +  t-0),  u\x  +  t  -  d),l)p{dj)  +  e{t,  (5). 

(4.9) 


/  max{0— t,  — r} 
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The  contribution  of  the  last  sum  in  (4.7)  is 


n— 1 


p(0  —  t  +  i6  +  5)  +  (5)  —  y°’'^(z(5)]  I[_T-<g_t+is+s<o} 


i=0 


(4.10) 


/  max{0— t r} 


modulo  an  error  that  is  due  to  the  approximation  of  p(-)  by  a  piecewise  constant 
function  and  that  is  bounded  by  e{t,6)  times  the  variation  of  y^’^{-)  on  the 
interval  [t  —  T,t  —  tq],  where  r  >  tq  >  0  is  defined  in  (A2.3).  The  initial 
condition  is  treated  similarly  and,  analogously  to  the  development  in  Theorem 
3.1,  adds  terms  of  the  form  to  the  expressions  computed  above. 


pmax{0  — i,  — r} 

'  — T 

Putting  the  pieces  together  yields 

r9 


0)  =  J  +  7  -  0),u\t  +  7  -  0),x)dx  +  J  +  X-0) 

+  J  9{x°'\t  +  l-d),u\t  +  x-0),x)dy{x) 


+e{t,S)  [l  +  |2:°’‘^|(t-To)-  \z°’^\{t-T)]  , 

X^’\t,0)  =  J  b{x°'^{t  +  x),u\t  +  j),x)dj  +  J  p{l)djy°'\t  +  x) 

+  J  +  X),u^{t  +  X),x)dp{x)  +  [1  + |z°’‘^|(t-ro)  -  \z°'^\{t-T)] 

X°^\t)  =  x{0)+  fx^^\s,0)ds 

t  t 

+  [  c{x°’\s),u\s))ds+  [  a{x°’\s))dw{t)+z°’\t). 

Jo  Jo 

Substituting  (4.11)  into  (4.12)  yields  that  x°’‘^(-)  satisfies  (2.2),  modulo  the  error 
terms.  It  follows  from  Lemma  2.1  and  its  extension  that  the  error  terms  go  to 
zero  as  (5  —>  0  in  that  for  each  T  <  oo 


(4.11) 

(4.12) 


lim  if  sup  e(t,  5)  [l  +  \z^’^\{t  —  tq)  —  \z°’^\{t  —  r)]  =  0. 

<5^0 

The  main  issue  concerns  the  convergence  properties  of  the  reflection  pro¬ 
cess  z^’^{-).  A  weak  convergence  argument  can  be  used.  The  set  of  processes 
(x°’‘^(-), ■u'^(-),  w(-), 2°’'^(-))  is  tight  in  the  Skorohod  topology  and  all  limits  are 
continuous.  One  needs  to  show  that  the  weak  sense  limit  (x(-),  ■u(-),  ■u)(-),  0(-)) 
of  any  weakly  convergent  subsequence  satisfies  (2.2),  where  z(-)  is  the  reflec¬ 
tion  term,  and  all  processes  are  nonanticipative  with  respect  to  the  standard 
Wiener  process  w{-).  The  proof  of  [10,  Theorem  1.2,  Chapter  11]  can  be  used 
to  complete  the  demonstration.  It  yields  that  z{-)  must  be  the  reflection  pro¬ 
cess  for  x{-),  as  well  as  the  nonanticipativity  properties.  Clearly,  {u{-),w{-))  has 


16 


the  same  probability  law  as  (m(-),w(-)).  This  and  the  weak-sense  uniqueness  of 
solutions  implies  that  the  original  sequence  converges,  as  desired. 

The  costs  converge  due  to  this  pathwise  convergence  and  the  continuity  of  the 
function  fc(-),  and  to  the  fact  that,  for  any  T  <  oo.  Lemma  2.1  and  its  extension 
imply  that  the  reflection  terms  on  the  intervals  [nT,  nT  +  T]  are  uniformly  (in 
n,6)  integrable,  since  they  satisfy  (2.4)  and  (2.5)  uniformly  in  (small)  6.  ■ 


5  The  Markov  Chain  Approximation  Method 

5.1  Brief  Review  of  the  No-Delay  Case 

In  preparation  for  the  approximation  for  the  delay  case,  let  us  recall  the  basic 
numerical  procedure  for  the  non-delayed  problem.  The  first  step  is  the  deter¬ 
mination  of  a  finite-state  controlled  Markov  chain  that  has  a  continuous-time 
interpolation  that  is  an  “approximation”  of  the  controlled  reflected  diffusion 
process  x{-).  The  second  step  solves  the  optimization  problem  for  the  chain 
and  a  cost  function  that  approximates  the  one  used  for  a;(-).  Let  h  denote  the 
approximation  parameter.  The  reference  [6]  contains  some  additional  details 
concerning  the  delay  problem.  The  system  of  concern  in  this  subsection  is 

dx  =  h{x,u)dt  +  a{x)dw  +  dz.  (5.1) 

where  x  €  G,  and  z(-)  is  the  reflection  process. 

To  construct  the  approximation,  start  by  defining  Sh,  a  discretization  of 
IR”,  which  we  let  be  a  grid  with  the  distance  between  points  in  any  coordinate 
direction  being  h.  The  precise  requirements  are  quite  general  and  are  spelled 
out  via  the  local  consistency  condition  given  below.  In  general,  the  interval  can 
depend  on  the  coordinate  direction  and  the  discretized  state  space  need  not  be 
a  grid.  It  is  only  the  points  in  G  and  their  immediate  neighbors  that  will  be  of 
interest.  Next  define  the  approximating  controlled  Markov  chain  Its  state 
space  will  be  a  subset  of  Sh,  and  is  usually  divided  into  two  parts.  The  first  part 
is  Gh  =  Gn  Sh,  on  which  the  chain  approximates  the  diffusion  part  of  (5.1).  If 
the  chain  tries  to  leave  Gh,  then  it  is  returned  immediately,  consistently  with 
the  local  reflection  direction.  Thus,  define  9G)J"  to  be  the  set  of  points  not  in  Gh 
to  which  the  chain  might  move  in  one  step  from  some  point  in  Gh-  The  set  9G)( 
is  the  reflecting  boundary  for  the  chain.  Let  p^{x,x\a)  denote  the  transition 
probabilities  at  state  x  under  control  value  a. 

Local  consistency  on  Gh-  Let  uj)  denote  the  controls  used  at  step  n  for 
Let  (respectively,  covar^’“)  denote  the  expectation  (respectively,  the 

covariance)  given  all  of  the  data  to  step  n,  when  =  x,u^  =  a.  Then  the 
chain  must  satisfy  the  following  condition.  There  is  a  function  At^{x,a)  >  0 
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such  that  (the  formulas  define  6^(-)  and  a^(-)) 

^x’,n  [^n+i  —  x\  =  b^{x,  a)At’^{x,  a)  =  b{x,  a)At^{x,  a)  +  o{At^{x,  a)), 
covar^’“  [^n+i  —  x\  =  a^{x,  a)At^{x,  a)  =  a{x)At^{x,  a)  +  o{At'^{x,  a)) 
a{x)  =  a{x)a'{x),  lim  sup  At^(x,  a)  =  0, 

^ — *^0  x^ct 

Un+^-a\<Kih, 

(5.2) 

for  some  real  Ki.  With  the  methods  in  [10],  At^(-)  is  obtained  automatically 
as  a  byproduct  of  getting  the  transition  probabilities.  Thus,  in  G,  the  first  two 
“conditional”  moments  of  ^n+i  ~  very  close  to  those  of  the  “differences” 

of  the  x{-)  of  (5.1).^  Approximations  satisfying  the  local  consistency  condition 
(5.2)  are  to  be  called  explicit  approximations. 

Local  consistency  on  the  reflecting  boundary  dC^.  From  points  in 
5G]|[,  the  transitions  of  the  chain  are  such  that  they  move  to  Gh,  with  the 
conditional  mean  direction  being  a  reflection  direction  at  x.  More  precisely, 
lim;i^o  distance(i9G[[[,  Gft)  =  0,  and  there  are  >  0  and  92{h)  ^  0  as  /i  ^  0 
such  that  for  all  x  G  9G[[', 

Exln  [^^+1  -  x]  G  {ay  :  7  G  d{x)  and  62(9,)  >a>  0ih} , 

At^{x,a)  =  0  for  X  G  dG^. 

The  last  line  of  (5.3)  says  that  the  reflection  from  states  on  dG^  is  instantaneous. 
The  reference  [10]  has  an  extensive  discussion  of  straightforward  methods  of 
getting  good  approximations. 

In  all  cases  in  [10],  the  transition  probability  for  the  chain  in  the  no-delay 
case  can  be  represented  as  a  ratio  of  the  following  form.  For  x  G  Gh, 

P{^i  =  i|^Q  =  X,  Ug  =  a}  =  p^(x,  x|a)  =  A^(x,  x,  a)/D^{x,  a), 

At^{x,a)  =  h? / D^{x,a),  inf  Zl^(x,  a)  >  0,  \x  —  x\  =  0{h) 

X^Oc 

where  D^{-)  are  functions  of  6(-),cr(-)  of  the  form  : 

N^{x,x,a)  =  N{hb{x,a),a{x),x),  D^{x,a)  =  D{hb{x,a),(j{x)).  (5.5) 

The  same  general  canonical  forms  (5.4)  and  (5.5)  will  be  used  for  the  delay 
case.  Define  At[(  =  At^(^([,0,  and  let  denote 

the  minimal  cr-algebra  that  measures  the  system  data  to  step  n.  By  centering 
around  the  conditional  expectation,  we  can  write 

^n+l  =  Cn  +  '^n)  +  Pn  +  +  o{At^),  (5.6) 

®The  consistency  condition  need  not  hold  everywhere.  See  [10,  discussion  in  Section  5.5 
and  Theorem  10.5.3,  and  also  the  discussion  concerning  discontinuous  dynamics  in  Section 
10.2  of  second  ed.j  for  examples  and  more  detail. 
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where  the  martingale  difference  /3^  has  conditional  (on  covariance  u^)/S.t’^. 

The  o(At(()  term  is  due  to  the  use  of  &(•)  in  lieu  of  b^{-)  since  by  (5.2), 

lim  sup  \b^{x,  a)  —  b{x,  a)|  =  0. 

^ — *^0  x,(y. 


Continuous-time  interpolations.  The  discrete-time  chain  is  used  for 
the  numerical  computations.  However,  for  the  proofs  of  convergence,  we  use 
a  continuous-time  interpolation  that  will  approximate  a:(-).  This  is  con¬ 

structed  as  follows.  Let  r'„,  n  =  0, 1, . . . ,  be  mutually  independent  and  exponen¬ 
tially  distributed  random  variables  with  unit  mean,  and  that  are  independent 
of  {it  ut  n  >  0}.  Define  =  YtZo  and 

Then  define  for  t  G  l^tTn+i)-  Define  the  continuous-time  inter¬ 

polations  u^(-)  of  the  control  actions  analogously  and  let  its  relaxed  control 
representation  be  denoted  by  m^(-),  with  time  derivative  Let  z^{-) 

denote  the  interpolation  of  X]r=o  ^^t 

Since  the  intervals  between  jumps  are  where  r'n  is  exponentially  dis¬ 
tributed  and  independent  of  !Ft  the  jump  rate  of  when  in  state  x  is 

1/At'*(a:).  Given  a  jump,  the  distribution  of  the  next  state  is  given  by  the 
p^{x,y\a),  and  the  conditional  mean  change,  for  x  G  Gh  and  control  value 
a  used,  is  b^ {x ,  a) At'^ {x ,  a) .  So  we  can  decompose  V'^(')  ™  terms  of  the 
continuous-time  compensator,  reflection  term,  and  and  martingale  as 

=  x(0) -h  /  b^{'iP^{s),u'^{s))ds  +  M>^{t)  +  z^{t),  (5.7) 

Jo 


where  the  martingale  (t)  has  quadratic  variation  process 
In  terms  of  the  relaxed  control  we  can  write 


a^{i^{s),u^{s))ds. 


f  b^{'ip^{s),u^{s))ds  =  f  f  b^{'ijj^{s),a)m^’'{s,da)ds. 

Jo  Jo  Ju 


It  can  be  shown  that  ([10,  Section  10. 4. 1])  there  is  a  martingale  w^{-)  (with 
respect  to  the  filtration  generated  by  the  state  and  control  processes,  possibly 
augmented  by  an  “independent”  Wiener  process)  such  that 


Mtt)=  f  attt),uts))dwts)  =  f  a{e{s))dwts)  +  ett), 

Jo  Jo 

where  cr'*(-)[(T^(-)]'  =  a^(-)  (recall  the  definition  of  a^(-)  in  (5.2)),  w^(-)  has 
quadratic  variation  process  It  and  converges  weakly  to  a  standard  Wiener  pro¬ 
cess.  The  martingale  e^(-)  is  due  to  the  difference  between  cr(a;)  and  a^(x,a) 
and 

lim  supsupL;|e^(s)p  =  0  (5.8) 

s<t 
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for  each  t.  Thus 


=  x{0)+  f  f  b^{'ip^{s),a)m^’'{da,  s)ds+ f  a{'ip^{s))dw^{s)+z^{t)+e^{t). 

Jo  Ju  Jo 

(5.9) 

The  following  result  [10,  Theorem  1.3,  Chapter  11]  is  an  analog  of  Lemma  2.1. 

Lemma  5.1.  Assume  (A2.1)-(A2.2)  and  that  for  some  constant  K,  the  b{x) 
and  (j{x)  in  (5.2)  are  replaced  by  arbitrary  -measurable  random  variables  that 
are  bounded  in  norm  by  K.  Then  the  corresponding  reflection  terms  satisfy 

lim  limsup  sup  if  (T)  =  0. 

^^0  bi-),a(-),x(0) 


For  any  T  <  oo, 

limsup  sup  (T)  <  oo. 

h^O  6(-),cr(-),a:(0) 

Note  on  convergence.  For  any  subsequence  ft.  ^  0,  there  is  a  further  sub¬ 
sequence  (also  indexed  by  ft  for  simplicity)  such  that  {ip'^ {■) ,  {■) ,  {■) ,  {■)) 
converges  weakly  to  random  processes  (a;(-))  w(-),  ■u;(-),  2:(-)))  where  m(-)  is  a  re¬ 
laxed  control,  {x{-),m{-),w{-),  z{-))  is  nonanticipative  with  respect  to  the  stan¬ 
dard  vector- valued  Wiener  process  w{-),  and  the  set  satisfies 

a:(t)  =  x(0)  +  /  /  b{x{s),a)m'{da,s)ds /  a{x{s))dw{s) z{t), 

Jo  Ju  Jo 

where  z{-)  is  the  reflection  term.  Along  the  selected  subsequence  W^{x,m^) 
W{x,m).  The  proofs  of  these  facts  are  in  [10,  Chapters  10,  11]. 

5.2  The  “Implicit”  Numerical  Procedure 

It  was  noted  in  Section  4  that,  for  the  approximations  used  there,  the  discretiza¬ 
tion  levels  for  time  and  for  the  9  variable  must  be  the  same.  The  time  interval  im¬ 
plied  by  the  Markov  chain  approximation  discussed  above  is  At^{^l^,  upf)  = 
which  is  commonly  of  the  order  of  ft^,  but  is  not  usually  a  constant.  The  transi¬ 
tion  probabilities  can  be  modified  so  that  Atp  =  At^,  a  constant,  also  of  order 
0{hf).  If  we  used  a  discretization  of  the  0  variable  with  levels  0{h^),  there 
would  be  far  too  many  ft-values  for  practical  use.  An  alternative  approxima¬ 
tion,  known  as  the  implicit  method  [10,  Chapter  12],  avoids  such  problems  and 
will  be  readily  adapted  to  and  particularly  useful  for  the  delay  problem.  The 
fundamental  difference  between  the  approximation  discussed  above  and  the  so- 
called  implicit  approaches  to  the  Markov  chain  approximation  lies  in  the  fact 
that  in  the  former  the  time  variable  is  treated  differently  than  the  state  vari¬ 
ables:  It  is  a  true  “time”  variable,  and  its  value  increases  by  Atp  at  step  n.  In 
the  implicit  approach,  the  time  variable  is  treated  as  just  another  state  vari¬ 
able.  It  is  discretized  in  the  same  manner  as  are  the  other  state  variables:  For 
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the  no-delay  case,  the  approximating  Markov  chain  has  a  state  space  that  is  a 
discretization  of  the  (x,  t)— space,  and  the  component  of  the  state  of  the  chain 
that  comes  from  the  original  time  variable  does  not  necessarily  increase  its  value 
at  each  step.  The  idea  is  analogous  when  there  are  delays.  Let  5  >  0  be  the 
discretization  level  for  the  time  variable.  We  will  assume  that  S  =  0{h).  Let 
p^’^{x,nS]x,nS\a)  denote  the  probability  that  the  state  x  moves  to  x,  and  the 
time  variable  does  not  advance,  under  control  a.  Let  p^’^{x,  n6]  x,  nS  +  i5|a)  de¬ 
note  the  probability  that  the  state  does  not  change  but  that  the  time  advances. 

Given  the  transition  probabilities  and  interpolation  interval  p^ {■) ,  At^ {■) , 
those  for  the  implicit  method  can  readily  be  computed  [10,  Section  12.4].  Sup¬ 
pose  that  at  the  current  step  the  time  variable  does  not  advance.  Then,  condi¬ 
tioned  on  this  event  and  on  the  value  of  the  current  spatial  state,  the  distribution 
of  the  next  spatial  state  is  just  the  p^{x,  x\a)  used  previously.  So  one  needs  only 
determine  the  conditional  probability  that  the  time  variable  advances,  condi¬ 
tioned  on  the  current  state.  This  is  obtained  by  a  “local  consistency”  argument 
and  no  matter  how  the  p^{-)  were  derived,  the  (no-delay)  transition  probabili¬ 
ties  p^’^{-)  and  interpolation  interval  for  the  implicit  procedure  can  be 

determined  from  the  p^(-),  At^(-)  by  the  formulas  [10,  Section  12.4],  for  x  €  Gh, 


h,  ,  p^'^{x,n5-,x,n5\a) 

^  ^  l-p’^^^{x,n5-x,n5  +  5\ay 

p^'yx,n5]x,n5  +  5\a)  = 

a)  +  d 

At^’^x  a)  = 

[x,a)  Ath{x,a)  +  6' 


(5.10) 


The  reflecting  states  are  treated  as  before.  One  usually  first  computes  the 
transition  functions  for  the  explicit  case,  where  they  must  satisfy  (5.2),  and  then 
uses  those  to  get  the  transition  probabilities  for  the  implicit  case  via  (5.10). 

Let  and  (j)^^  denote  the  spatial  and  temporal  states  at  step  n  under 
(5.10).  Deflne  At^^  =  u^^).  Under  (5.10),  (5.2)  holds  for  the  process 

with  the  time  interval  At^’^{-).  We  have  =  At^^  and 

=  +  +  (5.11) 


where  the  covariance  of  the  martingale  difference  term  /3g is  o{Aty^).  Let 
denote  the  reflection  term  for  the  process,  and  deflne  the  components  Sy^’^ 
by  =  y2i  di5y^n  -  Analogously  to  what  was  done  for  the  explicit  case,  deflne 
At^^  =  VnAty^,  and  The  continuous  time  interpolation 

is  for  t  G  [Tu'^ with  the  analogous  interpolations 

and  z^’^{-)  for  the  controls  and  reflection  term  ^  resp. 


Asymptotic  equivalence  of  the  time  scales.  For  each  t  >  0,  deflne  the 
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stopping  times 


d^{t)  =  max 


:  ^  At>l  = 


i=0 


=  max 


:  Atf =  t 


h.S 


i^O 


Note  that  d^{t)  will  never  be  the  index  of  a  reflecting  state,  since  the  time 
intervals  for  those  are  zero.  Define  d'^{t)  and  d^’^{t)  analogously,  but  with  At^ 
and  At^’^  used,  resp. 


Theorem  5.2.  For  each  t  >  0, 


lim  sup  E  sup 

S<t 


>(s) 


(5.12) 


Also, 


lim  sup  E  sup 

?i^0.<5^0„h,5  s<t 


E 


2  =  0 


(5.13) 


The  last  assertion  holds  with  d^’^{-)  replacing  d^’^{-)  and  also  with  At^{l  — 
replacing  At^’^.  Let  denote  the  interpolation  of  the  with  the  intervals 

At!f’^ .  Then  converges  weakly  to  the  process  with  value  t  at  time  t.  The 

result  of  the  last  sentence  holds  if  the  intervals  are 


Proof.  Owing  to  the  mutual  independence  of  the  exponential  random  variables 
{t'n}  and  their  independence  of  everything  else,  the  discrete  parameter  process 
Ln  =  ^^=o(At^  —  At^)  is  a  martingale.  Hence,  the  conditional  expectation  of 
the  squared  term  in  (5.12)  given  the  {At^}  equals 


>(i) 

E  [Arl^-At^f 

Ati,  i  <  oo 

2=0 

<T(t) 

E  ^  (^  +  sup  At(()  sup  At((  A  0, 

.  „  n  n 


which  yields  (5.12)  since  by  Doob’s  inequality  and  the  martingale  property, 
ifsupj<„|Ljp  <  AE\Ln\^.  Equation  (5.13)  and  the  assertion  following  it  are 
proved  in  the  same  way. 

For  the  next  to  last  assertion  of  the  theorem,  write 


2=0  2=0 

The  first  sum  equals  t,  modulo  sup„  At!f’^ .  The  variance  of  the  martingale  term 
is  5t,  modulo  (5  +  sup„  At’f’^,  and  the  term  converges  weakly  to  the  zero  process. 
This  yields  the  next  to  last  assertion  of  the  theorem.  This  and  (5.13)  yields  the 
last  assertion  of  the  theorem.  ■ 
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6  Approximating  the  System  With  Delays 

The  numerical  approximations  and  convergence  proofs  for  the  delay  problem 
will  now  be  developed.  Although  the  form  of  the  algorithm  is  motivated  by 
those  used  for  the  no-delay  problem,  it  is  more  complicated.  But  the  delay 
problem  is  much  more  complicated  and  we  are  interested  in  algorithms  that 
ameliorate  the  memory  requirements.  It  is  seen  from  the  discussion  in  Section 
4  that  the  time  variable  and  0  need  to  have  the  same  increments  if  there  is  to 
be  convergence  to  the  correct  values.  Discretizing  9  and  then  X^(t,  0)  so 

that  (5.2)  holds  for  the  resulting  process  would  require  time  and  9  intervals  of 
order  0{h?),  so  that  9  would  take  0{l/h?)  values,  much  too  high.  The  implicit 
method  for  the  Markov  chain  approximation  can  be  adapted  to  alleviate  these 
problems. 

Let  T  be  an  integral  multiple  of  5  and  let  9  take  values  in  =  {— r  -I- 
d, . . . ,  —S,  0}.  Owing  to  the  boundary  condition  — r)  =  0  there  is  no  need 
for  9  =  —T.  The  Markov  chain  approximating  {x^{t),x^{t,9),—T  <  9  <  0)  will 
be  denoted  by  &  G  T^)-  The  will  take  values  in  GhUdG+ 

with  instantaneous  reflection  back  if  it  leaves  Gh,  in  accordance  with  (A2.1) 
and  (A2.2).  We  suppose  that  for  each  9  G  T^,  ^n^’^{9)  takes  values  in  a  regular 
ft,-grid.  Any  interval  of  order  0{h)  can  be  used,  but  the  notation  becomes  more 
awkward. 

Boundaries  for  The  &re  in  G,  hence  bounded.  If  p(-)  =  0, 

then  (3.9)  shows  that  \x^{t,9)\  is  bounded  by  \b\{T+9)+\g\^{[—T,9])  where  |6|,  Ig] 
are  the  sup  values,  and  \^n^’^{9)  \  can  be  taken  to  be  bounded  by  slightly  higher 
values.  One  could  stop  the  process  ^n^’^{9)  when  it  reached  these  values.  In 
the  limit,  as  h,S  ^  0,  the  ^^^’^{9)  would  not  reach  the  boundaries  and  no  other 
precautions  need  to  be  taken  in  the  construction  of  the  algorithm.  However, 
if  there  is  a  delayed  reflection  term,  then  the  bound  is  changed  by  the  middle 
term  in  (3.9).  This  term  is  bounded  by  a  constant  times  the  variation  of  z{-) 
on  [t  —  TQ,t  —  t],  which  satisfies  Lemma  5.1  and,  in  the  limit.  Lemma  2.1.  For 
numerical  purposes,  we  need  to  bound  |Cn  ^’"^(fi*)!.  This  will  be  done  by  stopping 
it  when  it  reaches  some  level  such  that  the  probability  that  the  level  is  exceeded 
by  |x^(t, 6*)|  on  some  large  time  interval  [0,r]  is  small.  The  ^n^’^{9)  will  be 
stopped  for  all  9  if  any  one  hits  the  boundary.  The  required  bound  depends  on 
the  discount  factor  (3  and  the  value  of  r.  If  the  bound  is  reached  for  small  h,  6, 
then  there  will  be  an  error  in  computing  the  cost,  the  error  depending  on  the 
value  of  at  that  time.  A  reasonable  procedure  is  to  have  the  boundaries  at 
least  twice  what  is  needed  if  there  were  no  delayed  reflection  term.  If  a  boundary 
is  ever  reached,  the  drop  the  reflection  term  and  continue  with  the  others.  Until 
Theorem  6.3,  ignore  these  boundaries. 

The  algorithm  for  the  implicit  procedure.  As  in  Section  5  for  the  so-called 
implicit  method,  let  denote  the  time  variable.  The  algorithm  is  slightly 
different  from  the  implicit  method  defined  in  Section  5.  As  there,  the  steps  can 
be  divided  into  two  classes,  corresponding  to  the  time  variable  advancing  or  not. 
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Owing  to  the  coordination  between  the  advance  in  the  time  variable  and  the 
“shift”  in  9  associated  with  the  operator  if  time  advances  at  step  n,  then 
the  transitions  are 


^1M,5 

Sn+1 


{0)  =  i 


l,h,5 

n 


(0-6), 


Sn+1 


^0ih,5 


ih,6 

^n+1  ~  ' 


ih.S 


s. 


(6.1) 


The  transition  in  (9) ,  9  G  T^)  if  the  time  variable  does  not  advance 

at  step  n  will  be  defined  below.  Let  denote  the  control  applied  at  step  n 
and  let  denote  the  minimal  cr-algebra  that  measures  the  system  data  to 
step  n,  with  the  associated  conditional  expectation  denoted  by  . 

Recall  the  canonical  form  of  the  transition  probabilities  and  time  interval 
in  (5.4),  (5.5).  The  identical  forms  will  be  used  here  for  the  transitions  of  the 
component  Thus,  by  (5.4),  (5.5),  for  =  x°  G  and 

control  value  a  used,  the  probability  that  takes  the  value  conditioned 

on  the  event  that  the  time  variable  does  not  advance  at  step  n,  is 


(x^,  X 


N  (/i(x^  +  c(x°,  a)),  (j(a;°), 

D  {h{x^  +  c(x^,  a)),a(x^)) 


=0(h).  (6.2) 


Thus  the  transition  probability  for  has  the  same  dependence  on  the  full 

drift  vector  and  covariance  matrix  as  for  the  non-delay  case.  Indeed,  with  the 
correct  drift  and  covariance  used,  any  of  the  algorithms  in  [10,  Chapter  5]  can 
be  used  to  get  (6.2). 

Next  define  the  interval  At^(x^,x^,a)  =  h^/lD(h(x^  +  c{x^ ,a)),a{x^))]. 
Define  the  probability  that  the  time  variable  advances  at  step  n  by  (5.10), 
namely. 


p^’^{x^,  nS;  x°,  n6  +  (5|q;,  x^) 


At^{x°,  x^,a) 
At^{x'^,  x'^,a)  +  S' 


(6.3) 


Also,  from  (5.10),  define  the  interval  for  the  implicit  procedure: 


At^'^{x^ ,  x^  ,a) 


5At^{x'^,  x^,a) 
At^{x'^,x^,a)  +  S' 


If  =  a;°  ^  Gfi,  then  the  refiection  back  to  Gh  is  instantaneous  in  that 

At^{x^,  x^,a)  =  0,  and  it  is  in  accord  with  (A2.1)-(A2.2).  Set  Sz^^'^  =  — 

^0./i,(5l  j 


At  step  n  we  first  decide,  according  to  (6.3),  whether  time  advances  or  not. 
The  evolution  of  the  time  variable  can  be  decomposed  as  (5.11).  Let  denote 
the  indicator  function  of  the  event  that  the  time  variable  advances  at  step  n. 
By  the  local  consistency  condition,  for  G  Gh  and  on  the  event  that  time 

does  not  advance,  we  must  have 


E 


=  At  [^^’"’'(0)  +  c(^: 


covar 


h,5,<A 


Sn+1  Sn 


=  a(e"’')At((  +  o(At((). 


)]  +  o(AtJ(), 


(6.4) 
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Thus,  we  can  write 


^0,h^6  ^0,h,S 

Sn+1  Sn 


<’")  +  (1  -  4"’")  +  o(At:i), 


where  the  conditional  covariance  of  the  martingale  difference  is  AtJ^+ 

o(AtJj).  By  centering  about  its  conditional  expectation,  the  above  expres¬ 
sion  can  be  written  as 


Ctl  =  +  0{At), 

(6.5) 

where  the  conditional  covariance  of  the  martingale  difference  is  AtJ)’'^- 

o(A4’‘^).  Define  =  E  [(54’^’'^  14’“^  =  Then  we  can  write  = 

,  where  can  only  increase  if  the  refiection  is  to  the  tth  face 

of  G.  The  error  terms  Sz^’^’^  —  Sz^’^’^  are  asymptotically  negligible  and  will  be 
ignored;  see  [10,  Theorem  Equation  11.1.16],  where  a  slightly  different  notation 
is  used.  The  values  of  6y^’^’^  and  6z^’^’^  are  known  at  step  n. 

The  rule  for  (6.1)  was  easy  to  establish  since  x°(t)  evolves  as  a  diffusion,  so 
any  of  the  algorithms  in  [10]  could  be  readily  adapted.  To  develop  an  algorithm 
for  the  ^n^’^{0)  we  use  the  development  in  Section  4  as  a  guide.  The  change 
between  updates  of  the  time  variable  uses  the  three  right  hand  terms  of  (4.3) 
as  a  guide.  Since  there  are  usually  several  steps  between  updates  of  the  time 
variable,  those  three  terms  are  approximated  as  a  sum  of  terms.  We  proceed  as 
follows. 

If  time  does  not  advance  at  step  n,  then  4+1''  \6)  —  is  to  satisfy 


E 


=  Qn’\0), 


(6.6) 


where  (note  that  there  is  no  “shift  term”  in  9  since  time  is  not  advancing) 


qn'\d)  =  6(4>'*’^<•^0)A4 +p(0)<54^-^ 


0,4, (5  .^4,(5 


HO)  -  Kd  -  s)]  ^  u 


and  the  boundary  condition  is  =  0.  To  attain  the  conditional  mean 

(6.6),  one  randomizes  between  grid  points  that  are  closest  to  q^’^{9).  Write 


fl,h,S 

Sn+1 


(0)  -  en^’\0) 


(1-44  =  [4’40)  +  4’'4)]  (1 


(6.7) 


where  Pn’^{d)  is  the  randomization  noise.  The  randomization  can  be  correlated 
in  6,  but  is  independent  in  n.  Theorem  6.1  shows  that  the  noise  due  to  the 
randomization  is  asymptotically  negligible.  Recall  that  if  Sy^^’^  yf  0,  then 
A4  =  0  and  conversely. 

If  we  sought  to  add  the  terms  corresponding  to  the  right  hand  three  terms 
in  (4.3)  only  at  the  steps  when  the  time  variable  is  advanced,  we  would  have 
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to  keep  track  of  the  running  sums  of  the  between  such  updates,  which 

would  amount  to  an  additional  state  component.  No  doubt  there  are  other 
forms  that  are  advantageous. 

The  cost  function  is  a  discretization  of  (2.6).  The  initial  data  X°(0)i  ’) 

for  (3.1),  (3.2),  is  the  function  of  the  initial  path  and  control  segments  x,u, 
resp.,  as  given  by  (3.3).  Without  loss  of  generality,  we  can  suppose  that  the 
values  of  the  initial  data  (3.3)  are  on  the  grid  for  each  6.  If  control  u{-)  is  used 
on  [0,oo),  then  the  cost  function  can  be  written  as 


W'^’^{x%0),x\0),u)  =  W'^’\x,u,u) 


[KC 


(6.8) 


i=0 


where  ^  denotes  the  expectation  given  initial  data  x,  u  and  the  use  of  control 
■u(-)  on  [0,oo).  Let  V^^’'^(x°(0);  X^(0))  =  V^’^{x,u)  denote  the  infimum  of  the 
costs. 

Let  Vn,n  =  0,1,...,  be  mutually  independent  and  exponentially  distributed 
random  variables  with  unit  mean,  and  that  are  independent  of  ^ 

n  >  0}  and  the  initial  data.  As  in  Section  5.  define  The  ex¬ 

pression  (6.8)  is  asymptotically  equivalent  to 


-/3t, 


h,6 


[HC. 


0,h,S 


5)A 


-h.S 


q'Sy, 


i^O 


(6.9) 


To  assure  that  (6.9)  is  well  defined,  note  that  for  any  constant  D  >  0  and  v 
being  exponentially  distributed  with  mean  unity,  Ee~'"^  =  1/[1  -I-  I?].  This 
implies  that  the  exponential  in  (6.9)  is  equivalent  to  Y\a=o  +  ]>  which 

implies  that  (6.9)  is  well  defined  and  shows  shows  the  asymptotic  equivalence 
between  (6.8)  and  (6.9)  as  well.  The  form  (6.8)  (with  perhaps  a  numerically 
convenient  approximation  to  the  exponential)  would  be  used  in  the  numerical 
computations  and  the  form  (6.9)  in  the  proofs  of  convergence. 

Recall  the  continuous  time  interpolation  V'^(')  in  Section  5.  Define  = 
At/*’*^.  Set  for  t  G  and  define  the  interpo¬ 
lations  M^’'^(-),  and  analogously. 

Let  be  the  relaxed  control  representation  of  with  the  derivative 

at  t  denoted  by  Analogously  to  (5.7)  and  (5.9)  we  can  write 


t 

+  [  [  c{i;°’^’\s,a)m^’^’'{da,s)ds  +  + 

Jo  Ju 


The  process  ^o>^’‘5(.)  is  asymptotically  equivalent  to  z^’^’^{-).  There  is  a  mar¬ 
tingale  with  quadratic  variation  process  It  and  which  converges  weakly 

to  a  Wiener  process  as  {h,  (5)  ^  0  such  that 
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s 


where  satisfies  (5.8). 

Define  Vq’^  =  0  and,  for  n  >  0,  set  =  min{z  >  v^’'^  :  ~  4’i’^  =  '^}- 

The  next  step  is  to  show  that  the  noise  due  to  the  randomization  done  to  attain 
the  conditional  mean  in  (6.6)  is  negligible.  This  is  (6.10)  below.  The  expression 
(6.11)  shows  that  C;'i’^’‘^(0)  changes  little  between  the  steps  that  the  time  variable 
changes. 

Theorem  6.1.  Assume  (A2.1)-(A2.5)  and  that  At^(a;°,  a;^,  a)  =  0{h^).  Then, 
for  any  t  <  oo, 


lim  sup  E  sup 

h,S^0  uh,li^x,u,0 


n—1 


i=0 


=  0, 


where^ 


h,5  T 
V  ■  T  —  1 
1+1 


where  p^'^{9)  is  the  randomization  noise  defined  in  (6.7).  Also, 


(6.10) 


lim  sup  sup  E  sup 


^i...5(o)_^PM^(0) 


=  0.  (6.11) 


Proof.  For  notational  simplicity,  write  simply  as  u„.  First,  let  ^  Gh, 

so  that  we  are  at  a  reflection  step,  and  consider  the  randomization  noise  associ¬ 
ated  with  realizing  the  conditional  mean  p{6)6y^’^’^ .  Without  loss  of  generality, 
suppose  that  it  is  real-valued.  Suppose  that  p{6)6y^’^’^  lies  in  Inh+h]  where 
In  is  a  (random)  integer,  either  negative  or  positive.  Then  randomize  between 
the  end  points  to  get  the  desired  mean  p{9)Syn’^’^.  To  evaluate  the  variance 
(always  conditioned  on  +0’’^),  without  loss  of  generality  we  can  suppose  that  we 
have  shifted  the  means  so  that  ?„  =  0.  Then  the  probability  of  selecting  h  is 
p{6)5yn’^’^ /h  and,  since  6yn’^’^  =  0{h),  the  conditional  variance  is 


[h-p{e)d£’>^’^[ 


2  p{9)dy\ 


nO,h,S 


-  +  [p{e)Sy[ 


-0,h,S]2 


1  - 


p{9)6y: 


0,h,6 


h 


=  O{hM0)Sfn^’'l 
(6.12) 


To  evaluate  the  total  effect  on  the  interval  [0,t],  define  the  effect  on  the 
randomization  error  due  to  the  reflection  steps: 


r.+  1-l 


Qn’^W  =  E  fikYW  -  C/ ’"’'(0))  -p{0)Sy'^’’^’i 


VGh}' 


^The  is  over  all  stopping  times  no  bigger  than  the  time  needed  to  get  to 

interpolated  time  t. 
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Then  evaluate 


2  =  0 


Sin—i—l^h^S 


Qrw, 


(6.13) 


which  is  the  total  effect  of  the  randomization  noise  on  The  summands 

in  the  last  two  expressions  are  martingale  differences  since 


E, 


h^5 

Vn-\-l 


(4"4+i(^)  =  «• 


By  this  martingale  property  the  mean  square  value  of  (6.13)  is 

n-l  Vi^i-1 

i—0 


'{cr-vcfe}- 


Now,  using  the  conditional  variance  computation  (6.12),  we  see  that  the  contri¬ 
bution  to  (6.10)  due  to  the  reffection  steps  is 


0{h)E 


^o,h,s 


2  /  X 

(t  -  'To) 


0,/i,(5 1 


(t-r) 


By  Lemma  5.1,  this  satisfies  (5.8). 

Next  consider  the  errors  due  to  the  non-refiection  steps.  Suppose  that  it  has 
been  decided  that  time  does  not  advance  at  step  n  and  that  G  Gh-  Then, 

given  this  information,  the  conditional  mean  of  the  increment  {0)—^n^’^{9) 

is  qll'^{9),  with  p{0)5y'^^'^  =  0.  The  conditional  variance  of  Cn+A^)  ~  ■?«  ^’‘^(^) 
is  bounded  above  by  that  obtained  if  each  of  the  two  remaining  components 
were  randomized  separately.  So,  first  consider  the  term  ,  u!^’^ ,  9)At^  and 

set  the  term  involving  /r(-)  to  zero.  Suppose  (w.l.o.g.)  that  we  have  centered  the 
6(^°’^’‘^,  0)At((  so  that  they  are  in  Then  choose  h  with  probability 

,Un’^ ,9)At'^/h.  This  yields  a  conditional  variance  of  order  0{h)At'^. 
Now  redefine 


Gn  +  1  — 

=  E  4’^  9)At^ 


'{{“•'‘•"eGh}- 


As  above,  the  summands  are  martingale  differences  and  the  mean  square  value 
of  (6.13),  with  the  new  definition  of  Q^’^{9)  used,  is 


n— 1 


eJ2{<^ 


(5172  —  2—1^/2,(5 


2=0 


f'i  +  l-l 

E 

l^Vi  +  l 


X/{^O.fe,6gg,^}. 


(6.14) 

Since  E^^^[vn+i  —  u„]  =  0{l/h),  and  the  expectation  (conditioned  on 
of  the  typical  summand  of  the  inner  sum  of  (6.14)  is  0{h)At^  =  0{h^),  the 
expectation  (conditioned  on  of  the  inner  sum  is  0{h?).  So  (6.14)  is  0{h^) 
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times  the  mean  number  of  time  updates  needed  to  get  to  interpolated  time  t. 
But  this  is  0{t/h).  So  the  contribution  to  the  left  side  of  (6.10)  is  zero. 

Now  consider  the  component  of  Qn^iO)  defined  by  (setting  b{-)  =  0) 


[fi{0)  -  fj,{e  -  j)] 

5 


9iC 


0,h,6 


h,S 


,9)  At 


h 

n ' 


Again,  suppose  that  we  have  centered  so  that  it  is  in  [0,  h].  Then  select  h  with 
probability  q!^'^{9)/h.  The  conditional  (on  variance  of  {9) 
is  0{h)q^’^ {9)  +  [qn'^{9)Y  =  0{h)q^'^{9).  To  evaluate  the  total  effect  on  the 
interval  [0,t],  redefine 


Q: 


/i,(5 


'{{“•''•■'eGh}- 


Then  evaluate  (6.13)  with  this  new  definition  of  Q’^'^{9).  Since  we  set  &(•)  =  0, 
the  summands  are  martingale  differences  in  that 


e: 


h,6 

Vn-\-l 


(4"4+l(^)  -  4+'(^))  -  &(^)]  =  0- 


With  the  new  definition  of  QnWy  mean  square  value  of  (6.13)  is 


n— 1  ui+i  — X  2 

E  [(?/+i’'W  -  e/’'‘’'(0))  -  9f’'(0)]  ■  (6.15) 

2  —  0  l  — 

The  (conditioned  on  expectation  of  the  inner  sum  in  (6.15)  is 


o{h^)  {^9)  -  KO  -  S)f  +  (Kd)  -  9{e  -  ^)) 


E^f[vi+i  -  Vi 


=  0{h)  (m(0)  -  9{0  -  5)f  +  {ti{9)  -  f,{9  -  <5)) 


Now  taking  the  shift  [<1)'^]"“®“^  in  (6.15)  into  account  and  noting  that  fj,{9)  — 
^{9  —  S)  =  0  for  0  <  — r,  we  see  that  (6.15)  is  0{h),  uniformly  in  n.  Thus  (6.10) 
holds.  The  verification  of  (6.11)  is  straightforward  and  the  details  are  omitted. 


A  continuous  time  interpolation.  Recall  the  definition  of  At^’^  below 
(5.11).  Define  =  ^^’^'^{9),  and  =  (f)^ 

for  t  G  [Tn’^ Recall  the  discussion  of  the  boundaries  on  ^n^’^{9)  at  the 
beginning  of  this  section.  They  are  ignored  in  the  next  theorem  but  reintroduced 
in  Theorem  6.3. 
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Theorem  6.2.  Assume  (A2.1)-(A2.5)  and  that  AdA ,  a)  =  0{h^).  Then 


f  _  f  f  dj  f  b{ip^'^{t  +  ^),a,'y)m^'^''{da,t  +  l) 

Jq  ^  u 

^  Jo  I-  + 

+  [  dt  [  dfj,{-f)  [g{il;^’^{j  +  t),a,'y)m'^'^''{da,-f  +  t)  +  Po'\T), 
Jo  J-T  Ju 


where 


lim  sup  £^sup  |po’'^(5)|  =  0. 
h,5  »-0  u^,s  ^  {I  s<t 


(6.16) 

(6.17) 


Proof.  Continuing  to  use  Vn  =  ,  we  can  write 

-  ^)  +  By{0)  +  p^ie)  +  Gy{e)  +  R^J{e).  (6.18) 

where  RG^{9)  was  defined  in  Theorem  6.1  and 

py{9)  =  E  =  pi(^)  [ttj  -  > 

l^Vr^  +  l 

+  /\fh 

g:(’^(0)  =  k0)-m^-^)]  e 

l=v„  +  l 

Until  further  notice  ignore  the  effects  of  the  initial  condition  (3.3).  First  con¬ 
sider  the  contribution  of  the  b{-)  terms  to  'tp^’^’^{t,9).  Their  total  contribution 
to  ^l’^+\i9)  is,  for  n  >  1, 


i=0 

Vn  —  1 

=  E  KC^°’'*’^uf’^6l-r^(5  +  (^if’'^  +  (5)A^f(l- J;'*’'^). 


i=0 


i=0 


This  is  equal  to 


Vn  —  1 

E  w^^  9-n6  +  (6.19) 

1=0 

plus  a  martingale  “error”  whose  quadratic  variation  process  is 

Define  7^’'^(f)  =  max{n  :  <  t\.  The  number  of  time  advances  that  have 
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occurred  when  interpolated  time  t  is  reached  is  For  <  t  the 

values  of  0)  and  differ  by  the  contributions  of  the  iterates 

in  the  interval  [V(j,h,si^^ys  +  1,  —  1],  those  that  occur  before  interpolated 

time  t  is  reached  but  at  or  after  the  last  update  of  the  time  variable  before 
interpolated  time  t  is  reached.  By  (6.11),  the  contributions  of  these  terms  for 
0  =  0  is  asymptotically  negligible.  Letting  n  =  and  adding  these 

terms,  (6.19)  becomes 


1=0 

By  Theorem  5.2  and  the  continuity  of  &(•),  the  last  expression  equals,  modulo 
an  error  that  satisfies  (6.17), 


E  ^ 


s/  5 


1-1 


h^5 


'-^+E^^fc 


Ar, 


h,5 


Z=0 


which  for  0  =  0  can  be  written  as  (modulo  an  error  satisfying  (6.17)) 


{s),u^'^{s),—t  +  s)  ds  =  ( 

J  rr 


jc{  — t,  — r}  J  U 


max{0,t— r} 


b  (■0°’^’‘^(s),  M^’‘^(s),  —t  +  s)  ds 
m^'^''{da,  s  +  7)(i7, 


(6.20) 

where  the  last  equality  uses  the  change  of  variable  7  =  — t  +  s,  the  fact  that 
b{-,9)  =  0  for  0  <  — r,  and  switches  to  relaxed  control  notation.  The  above 
mentioned  martingale  error  process  has  quadratic  variation  0{h)  and  satisfies 
(6.17). 

Next,  consider  the  contribution  of  the  p{9)Sy^’'^’^  terms  to  Fol¬ 

lowing  the  development  for  the  6(-)  terms  above,  the  contribution  is 


1=0 

By  Theorem  5.2,  is  asymptotically  equivalent  to  and  Thus,  by 
Theorem  5.2  and  the  continuity  of  p(-),  the  right  hand  side  can  be  written  as 

^  p(0_i  +  7-^VyE' 

1=0 

modulo  an  error  that  can  be  written  as  e{h,6)  [|2;°’^’'^|(t)  —  —  r  —  (5)]  , 

where  e{h,S)  — >  0  as  /i,  <5  ^  0.  By  Lemma  5.1,  the  error  term  satisfies  (6.17). 
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Then,  for  0  =  0,  by  a  change  of  variable  and  using  the  fact  that  p{9)  =  0  for 
9  <  —T,  we  can  write  the  last  expression  as 

f\{-t  +  s)df’’^’\s)=  f  +  (6.21) 

^0  t/ max{— t,  —  r} 

Now  consider  the  contribution  of  the  terms  involving  g(-)  to  the  integral 
Jq  tp^''^'^is,0)ds.  By  (6.11),  to  evaluate  this  integral,  we  can  suppose  that 
^i,/i,(5(o)  jg  constant  on  the  intervals  between  updates  of  the  time 

variable.  Define  N^’^{T)  =  (j)^’^{T)/6.  Then,  making  the  piecewise  constant 
approximation,  the  integral  is  (modulo  an  error  satisfying  (6.17)), 


Nh.s^T)-! 


—h,S 

T  —  T 

^n+l  Vn 


(6.22) 


n—0 


Until  further  notice,  consider  only  the  part  that  is  due  to  the  g{-)  terms  and 
continue  to  ignore  the  effects  of  the  initial  condition.  Then  we  have,  for  n  >  1, 

n—1 

e.:^'\0)  =  J2G’:'\9-nS  +  iS  +  S). 


2  =  0 


By  the  definition  of  this  last  expression  can  be  written  as 


E 

2  =  0 


[fi{0  —  n6  iS  6)  —  fi{9  —  nS  iJ)] 


l=Vi  +  l 


(6.23) 


The  sum  (6.22)  will  only  be  changed  by  a  quantity  satisfying  (6.17)  if  we  replace 
the  inner  sum  in  (6.23)  by 

Gi’^{—n  +  i)=  y]  ,uf’^ ,9  —  n6  +  i6  +  6)AtI^’^ . 


Recall  that  g{-,9)  =  0  and  p{9)  =  0  for  0  <  — r.  Let  0  =  0.  By  a  change  of 
variable  n  —  i  =  q,  the  use  of  (•)  in  lieu  of  G^ (•),  and  a  change  in  the  order 
of  summation,  we  can  write  (6.22)  as 


E  H-qS  +  S)  -  p{-qS)]  ^  G^-qi-q) 

q—1  n—q 


^h,5  n-h-,S 

T  —  T 

'2;^+!  ‘Vn 


(6.24) 


The  fraction  on  the  right  can  be  replaced  by  unity,  only  incurring  an  error  that 
satisfies  (6.17).  Then  we  can  write  the  inner  sum  as  (modulo  an  error  satisfying 
(6.17)) 

f  '  g{^°’^’\s),u’^’\s),-qS  +  6)ds. 

Jo 
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Next,  using  this  expression  and  the  continuity  of  g{-),  (6.24)  can  be  approxi¬ 
mated  by 


/  max{— T,— r} 


f'T+'f 


which  is  equal  to 


I  gi^P^’>^’\t  +  j),u’^’\t  +  j),^)dt.  (6.25) 

By  Theorem  6.1,  the  contribution  of  the  randomization  errors  pf’^{6)  to  the  left 
side  of  (6.16)  satisfies  (6.17). 

It  can  be  shown,  via  analogous  computations,  that  adding  the  effects  of  a 
discretization  of  the  initial  condition  (3.3)  changes  the  inner  integral  in  (6.25) 
to  /g  ,  and  in  (6.20)  and  (6.21)  changes  the  integral  to  .  The  few  details 
are  omitted.  Finally,  making  these  changes,  integrating  the  resulting  (6.20) 
and  (6.21)  over  [0,r],  and  writing  (6.25)  in  relaxed  control  notation,  yields  the 
theorem.  ■ 


Convergence  of  the  numerical  algorithm.  Recall  the  representation  (6.5). 

The  continuous  time  interpolation,  (with  intervals  of  the  mar¬ 
tingale  is  ^  martingale  with  quadratic  variation  process  {s))ds 

plus  an  error  that  goes  to  zero  as  ft-,  5  —>  0.  The  next  theorem  shows  that  the 
optimal  values  computed  by  the  numerical  algorithm  converge  to  the  optimal 
value  of  the  original  problem  as  ft,  ft  ^  0. 

Theorem  6.3.  Assume  (A2.1)-(A2.5)  and  that  At^{x^,x^,a)  =  0{hP).  Sup¬ 
pose  that  there  is  no  delayed  reflection  term  and  that  the  boundaries  on 
are  large  enough  so  that  they  would  not  he  exceeded  by  x^(t, ft).  Then  there  is  a 
martingale  with  quadratic  variation  process  It,  that  converges  weakly  to 

a  Wiener  process,  and  for  which  {modulo  a  term  that  goes  to  zero  as  h,S  ^  0) 

M'^’^it)  =  [  a{p'^'\s)dw'^'\s).  (6.26) 

Jo 

For  any  sequence  of  controls  for  the  chain,  the  set  {tp^A,s  m^’^{-),w^’^{-), 
{interpolation  intervals  is  tight  in  the  Skorohod  topology  and  converges 

weakly  to  a  solution  to  (2.3).  The  optimal  costs  for  the  chain  {&) ,  &  € 

T^},  and  cost  function  (6.8)  converge  to  the  optimal  cost  for  original  process 
(2.3)  and  cost  function  (2.6)  if  the  initial  conditions  are  the  same. 

Now  add  the  delayed  reflection  term  and  recall  the  discussion  on  boundaries 
at  the  beginning  of  the  section.  The  sequence  {^’^’^’'^{■),m^’^{-),w^’'^{-),z^’^’^{-)) 
is  still  tight  and  the  limits  of  the  optimal  costs  for  the  chain  are  arbitrarily  close 
to  that  for  the  original  process  if  the  boundaries  are  large  enough. 

Proof.  With  the  preparation  in  Theorem  6.2  in  hand,  the  proof  follows  that 
in  [10,  Chapter  11]  closely  and  the  reader  is  referred  to  that  reference  for  more 
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detail.  Fix  a  control  sequence.  The  martingale  is  tight  in  the  Skorohod 

topology.  Then,  since  the  increments  are  0{h),  any  weak-sense  limit  has 

continuous  paths  with  probability  one.  As  noted  above  and  in  Section  5.  the 
proof  [10,  Section  10.4.1]  implies  that  there  is  a  martingale  satisfying 

(6.26),  modulo  an  asymptotically  negligible  error,  and  that  converges  weakly  to 
a  standard  vector-valued  Wiener  process.  The  error  is  due  to  the  o(At[()  term 
in  (6.4).  Lemma  5.1  is  applicable  and  implies  that  is  tight.  Again,  since 

the  are  0(h),  the  tightness  implies  that  all  weak-sense  limit  processes 

are  continuous  w.p.l.  These  facts  and  the  boundedness  of  $;'i’^’‘^(0)  implies  the 
tightness  of  and  the  asymptotic  continuity  of  any  weak-sense  limit.  Any 

sequence  of  relaxed  controls  is  tight. 

Now  extract  a  weakly  convergence  subsequence  with  limit  denoted  by  (x(-),  w(-), 
m{-),z{-)).  Then  the  proofs  in  [10,  Chapters  10,  11]  imply  that  (x(-),  w(-),  m(-),  z(-)) 
is  nonanticipative  with  respect  to  w(-),  that  m(-)  is  an  admissible  relaxed  con¬ 
trol,  and  that  the  set  satisfies  (2.3).  Also,  the  costs  for  the  chain  converge  to 
the  cost  for  the  limit  process.  If  we  let  be  an  optimal  control  for  the 

chain,  then  these  comments  imply  that  liminf^^i^o  m)  >  V{x,u).  The 

reverse  inequality  limsup^j  V^’^{x,u)  <  V{x,u)  is  proved  just  as  it  was  for 
the  no-delay  problem  in  the  reference.  The  presence  of  delays  does  not  change 
the  structure  and  the  details  are  omitted.  ■ 


7  Size  of  the  state  space  for  the  approximating 
chain. 

The  comments  concerning  dimension  and  memory  below  (3.3)  all  apply  to  the 
numerical  procedures.  Note  that  the  complexity  of  the  computation  of 
is  not  heavily  dependent  on  the  dimension  of  the  control  variable,  or  on  compo¬ 
nents  of  x{-)  that  do  not  have  delay  components.  If  the  control  and  reflection 
term  are  not  delayed,  then  [6]  discusses  useful  numerical  procedures,  based  on 
the  use  of  (2.2),  without  the  need  for  auxiliary  variables,  and  they  are  preferable 
for  such  problems.  Those  procedures  do  not  appear  to  be  useful  if  the  control 
and/or  the  reflection  processes  are  also  delayed.  In  particular,  one  would  have 
to  keep  track  of  the  values  of  the  control  the  reflection  terms  over  the  delay 
intervals  and  approximate  them  by  finite- valued  discrete-time  processes  that  do 
not  lose  too  much  information.  These  approximations  become  part  of  the  state 
space  of  the  approximating  chain,  usually  making  its  size  much  too  large.  For 
such  problems  the  approach  of  this  paper  is  quite  promising.  The  size  of  the 
state  space  is  the  product  of  what  is  needed  for  and  where  9 

takes  t/S  values. 

Consider  a  one-dimensional  problem  where  g{-)  =  p(-)  =  0  and  let  the  dis¬ 
cretization  level  for  the  £,^^'^{0)  be  h.  Then  there  is  a  AT  <  oo  such  that 
lx^(C  ^)l  <  K{t  +  9),—t  <  9  <0.  Without  loss  of  generality,  suppose  that  x(t) 
has  been  centered  so  that  it  lies  in  an  interval  [0,i?o]  for  some  Bq  <  oo.  The 
state  space  for  has  [Bo/h  +  3]  points.  There  are  t/S  values  for  9.  Thus,  if 
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we  bound  ^^’^’^{9)  by  K{t  +  9),  the  maximum  number  of  points  (which  includes 
the  reflecting  states)  is 


[Bo/h  +  3] 


Kt  K{t  —  S) 
h  h 


KS 

h 


While  large,  this  is  much  better  than  a  direct  procedure  (or  the  procedure  in 
[6])  when  there  are  delays  in  the  control.  We  will  now  improve  it  further. 

Let  h  =  6  and  suppose  that  b(-)  is  Lipschitz  continuous  in  0,  uniformly  in 
the  other  variables.  Then 


-  <5)  =  O(^)  +  noise,  (7.1) 

where  the  noise  is  due  to  the  randomization  and  has  variance  (Theorem  6.1) 
0(/iAt(()  =  0{h^).  Since  =  0,  the  state  space  can  consist  of  the 

differences  ^^’^’'^(0)  —  —  (5),  — r  +  5  <  0  <  0.  Recall  the  discussion  in 

Theorem  6.1  concerning  the  randomization  noise.  In  the  example  there,  when 
doing  the  randomization  to  attain  the  desired  conditional  mean,  the  value  h 
was  selected  with  a  probability  of  order  0{h),  and  the  value  zero  with  a  high 
probability.  This  implies  that,  with  a  high  probability,  the  number  of  selections 
of  the  value  h  between  advances  of  the  time  variable  is  bounded  by  some  small 
number.  Thus,  suppose  that  we  can  (with  a  high  enough  probability)  bound  the 
differences  in  (7.1)  by  H5,  for  some  integer  H.  Then  the  size  of  the  state  space 
is  [Bo/h  +  a  considerable  improvement.  More  care  is  needed  if  there  is 

a  there  is  a  or  Sy  term.  But  there  will  still  be  an  improvement.  Clearly, 
much  more  work  is  needed  on  the  algorithms  and  state  space  representations. 
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